Computational Methods

Bu son kisimdaysa gecen hafta causal inference’a giris yaptigimiz gibi sosyal bilimlerde sikca kullanilan computational metodlara bir giris yapip orneklerine bakacagiz. Bu yontemler

  • Web Scraping
  • Text Analysis
  • Sentiment Analysis
  • Network Analysis
  • Spatial Analysis
  • Machine Learning

olacak. Bunlarin hepsi ozellikle son donemlerde yaygin bir sekilde sosyal bilimler alaninda kullaniliyor ve calisilan konuya gore de onemleri yuksek. Bundan dolayı uygulamalı olarak temel bir giriş yapmak önemli olacaktır.

Diğer paketler

library(tidyverse)
library(hms)
library(stringr)
library(writexl)
library(readxl)
library(lubridate)

Scraping paketleri

library(rvest)
library(xml2)
library(XML)
library(httr)

Text paketleri

library(tidytext)
library(stopwords)
library(ggwordcloud)
library(seededlda)
library(glmnet)
library(caret)

Web Scraping

Bir siteden veri cekmek icin once robots.txt kismina bakariz. Orada hangi sayfaların çekmeye uygun olup olmadığını görebiliriz. Bir örnek bile bunu deneyelim.

Sabah Gazetesi’nin sayfasından verileri çekeceğimizi varsayalım. Bunun için öncelikle robots.txt kısmına bakarız. Bunun sonrasında verileri çekebileceğimiz sayfayı bulmamız lazım. Bunun için ya sayfa içinde arama yapmak bir çözüm olabilir. Verileri çekmek için farklı yöntemler kullanabiliriz. Bunlardan ilki HTML ile çekmek olacak. Neredeyse websitesi HTML formatı üstünde bir yapıya kurulu olur ve bunu kullanarak site içinden resim, yazı veya tablo çekebilirsiniz.

h1 ve p gibi kısımlara node deriz. Bunlar veriyi çekmemiz için en önemli kısımlar çünkü bütün veriler bunların altında saklanıyor. Sayfa yapısını inspect ederek veya belli uzantılar aracılığıyla bakarak neyin hangi yapıda olduğunu anlayabiiriz. Şimdi bir örnek ile görelim.

url <- "http://books.toscrape.com/index.html"
webpage <- read_html(url)
titles <- webpage %>%
  html_nodes("h3 a") %>%
  html_attr("title")
titles
##  [1] "A Light in the Attic"                                                                          
##  [2] "Tipping the Velvet"                                                                            
##  [3] "Soumission"                                                                                    
##  [4] "Sharp Objects"                                                                                 
##  [5] "Sapiens: A Brief History of Humankind"                                                         
##  [6] "The Requiem Red"                                                                               
##  [7] "The Dirty Little Secrets of Getting Your Dream Job"                                            
##  [8] "The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull"       
##  [9] "The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics"
## [10] "The Black Maria"                                                                               
## [11] "Starving Hearts (Triangular Trade Trilogy, #1)"                                                
## [12] "Shakespeare's Sonnets"                                                                         
## [13] "Set Me Free"                                                                                   
## [14] "Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)"                                       
## [15] "Rip it Up and Start Again"                                                                     
## [16] "Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991"            
## [17] "Olio"                                                                                          
## [18] "Mesaerion: The Best Science Fiction Stories 1800-1849"                                         
## [19] "Libertarianism for Beginners"                                                                  
## [20] "It's Only the Himalayas"
prices <- webpage %>%
  html_nodes(".price_color") %>%
  html_text() %>%
  sub("£", "", .) %>% 
  as.numeric()   
prices
##  [1] 51.77 53.74 50.10 47.82 54.23 22.65 33.34 17.93 22.60 52.15 13.99 20.66
## [13] 17.46 52.29 35.02 57.25 23.88 37.59 51.33 45.17
ratings <- webpage %>%
  html_nodes(".star-rating") %>%
  html_attr("class") %>%
  sub("star-rating ", "", .)
ratings
##  [1] "Three" "One"   "One"   "Four"  "Five"  "One"   "Four"  "Three" "Four" 
## [10] "One"   "Two"   "Four"  "Five"  "Five"  "Five"  "Three" "One"   "One"  
## [19] "Two"   "Two"
books_df <- data.frame(
  Title = titles,
  Price = prices,
  Rating = ratings,
  stringsAsFactors = FALSE
)

Benzer şekilde websitelerindeki tabloları da çekebiliriz. Burada tabloyu sitede gördüğümüz şekliyle çektiğini görüyoruz.

url2 <- "https://en.wikipedia.org/wiki/List_of_largest_companies_in_the_United_States_by_revenue"
webpage2 <- read_html(url2)
tables <- webpage2 %>% html_nodes("table.wikitable")
companies_table <- tables[[1]] %>% html_table()

Xpath ile de veri çekebiliriz. xpath, html ve xml dökümanları ile çalışmak için güçlü bir tool. Bunu HTML lokasyonlarını öğrenmek için de her objenin kendi yerini öğrenmek için de kullanabiliriz.

url4 <- "http://quotes.toscrape.com/"
webpage4 <- read_html(url4)
quotes <- webpage4 %>%
  html_nodes(xpath = '/html/body/div[1]/div[2]/div[1]/div[1]/span[1]') %>%
  html_text() %>%
  gsub("^\\s+|\\s+$", "", .)

Bu şekilde manüel seçimler yapmak yeriine toplu bir şekilde de veri çekebiliriz. Sabah Gazetesi öğreneğine dönecek olursak:

sitemap_urls <- c("https://www.sabah.com.tr/sitemaparchives/post/2023-1.xml",
                  "https://www.sabah.com.tr/sitemaparchives/post/2023-2.xml",
                  "https://www.sabah.com.tr/sitemaparchives/post/2023-3.xml",
                  "https://www.sabah.com.tr/sitemaparchives/post/2022-12.xml")

2023 yılının ilk 3 ayında Sabah gazetesinde yayınlanmış haberleri çekelim.

df_sabah <- map_dfr(sitemap_urls, function(url) {
  sabah_data <- read_xml(url)
  sabah_urls <- xml_text(xml_find_all(sabah_data, "//d1:loc"))
  sabah_dates <- xml_text(xml_find_all(sabah_data, "//d1:lastmod"))
  sabah_links <- tibble(urls = sabah_urls, dates = sabah_dates)
  sabah_links_2 <- sabah_links %>%
    filter(str_detect(dates, "^2023-(01|02|03)")) %>%
    mutate(dates = substr(dates, 1, 10))
  return(sabah_links_2)
})
num_stories <- nrow(df_sabah)
sprintf("In the first 3 months of 2023, Sabah published %d stories.", num_stories)
## [1] "In the first 3 months of 2023, Sabah published 47296 stories."

Text Analysis

Sonra bu haberlerin kategorizasyonunu yapalım

df_sabah <- df_sabah %>%
  mutate(category = str_extract(urls, "(?<=sabah.com.tr\\/)[^\\/]+")) %>%
  mutate(category = case_when(
    category %in% c("dunya") ~ "World",
    category %in% c("egitim") ~ "Education",
    category %in% c("ekonomi", "finans") ~ "Economy",
    category %in% c("gundem") ~ "Politics",
    category %in% c("kultur-sanat", "magazin", "medya") ~ "Entertainment",
    category %in% c("spor") ~ "Sports",
    category %in% c("yazarlar") ~ "Opinion",
    TRUE ~ "Others"
  ))
df_sabah %>%
  count(category) %>%
  mutate(category = reorder(category, -n)) %>%
  ggplot(aes(x = category, y = n, fill = category)) +
  geom_bar(stat = "identity") +
  ggtitle("Number of News by Category in Sabah Newspaper 2023-01/03") +
  xlab("Category") +
  ylab("Count") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +  
  theme(legend.position = "none")

Haber başlıklarında kaç kere Kılıçdaroğlu kaç kere Erdoğan geçmiş diye bakalım

count_df <- df_sabah %>%
  mutate(names = str_extract_all(urls, "erdogan|kilicdaroglu")) %>%
  unnest(names) %>%
  count(names)
sabah_stories <- df_sabah %>%
  filter(str_detect(urls, "kilicdaroglu|erdogan")) %>%
  distinct(urls, .keep_all = TRUE)
invisible(any(duplicated(sabah_stories$urls)))
paste0("Erdogan appears ", count_df$n[1], 
       " times, and Kilicdaroglu appears ", count_df$n[2], " times.")
## [1] "Erdogan appears 1020 times, and Kilicdaroglu appears 314 times."

Haber içerikleri analizini yapalım

sabah_stories <- sabah_stories %>%
  filter(category == "Politics")
sabah_stories_combined <- data.frame(title = rep(NA, nrow(sabah_stories)),
                                     publish_date = rep(NA, nrow(sabah_stories)),
                                     main_text = rep(NA, nrow(sabah_stories)),
                                     erdogan = rep(NA, nrow(sabah_stories)),
                                     url = rep(NA, nrow(sabah_stories)),
                                     stringsAsFactors = FALSE)
sabah_stories_combined <- sabah_stories_combined %>%
  mutate(publish_date = sabah_stories$dates) %>%
  mutate(urls = sabah_stories$urls) %>%
  mutate(erdogan = ifelse(grepl('erdogan', urls), 1, 0)) %>%
  select(title, publish_date, main_text, erdogan, urls)

Hem haber başlığını hem de textini çıkaralım

for (i in seq_along(sabah_stories$urls)) {
  # get HTML content
  html <- read_html(sabah_stories$urls[i])
  
  # Extract title
  title <- html %>% html_nodes("h1.pageTitle") %>% html_text()
  
  # Extract main text
  main_text_p <- html %>%
    html_nodes("p") %>%
    html_text()
  
  if (length(main_text_p) > 0) {  
    main_text <- main_text_p
  } else { 
    main_text_newsBox <- html %>%
      html_nodes(".newsBox.selectionShareable") %>%
      html_text()
    main_text <- main_text_newsBox
  }
  
  
  sabah_stories_combined$title[i] <- title
  sabah_stories_combined$main_text[i] <- paste(main_text, collapse = "\n")
  
  
  Sys.sleep(7)
}

Kaydedelim ve tekrar çağıralım

write_csv(sabah_stories_combined, "sabahstories.csv")
sabah_stories_combined <- read_csv("sabahstories.csv")
## Rows: 609 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): title, main_text, urls
## dbl  (1): erdogan
## date (1): publish_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sabah_stories_combined$publish_date <- as.Date(sabah_stories_combined$publish_date)

Günlük olarak çıkan haberlere bakalım

counts <- sabah_stories_combined %>%
  mutate(date = as.Date(publish_date, format = "%Y-%m-%d")) %>%
  group_by(date) %>%
  summarise(erdogan_count = sum(grepl("erdogan", urls, ignore.case = TRUE)),
            kilicdaroglu_count = sum(grepl("kilicdaroglu", urls, ignore.case = TRUE))) %>%
  ungroup()

Başlıklardan noktalamaları çıkaralım, küçük harfe çevirelim ve içinde kılıcdaroğlu geçenleri ayrı bir değişkene alalım.

sabah_stories_combined$title2 <- 
  str_to_lower(str_replace_all(sabah_stories_combined$title, 
                               "[[:punct:]]", ""))
sabah_stories_combined <- sabah_stories_combined %>% 
  mutate(kilicdaroglu = ifelse(grepl('kilicdaroglu', urls), 1, 0))

Stopwords ekleyelim. Bunu R’ın içinde Türkçe stopwords olmadığı için ayrı paketle yapıyoruz.

tr_stopwords <- stopwords::stopwords("tr", source = "stopwords-iso")
tr_stopwords <- tr_stopwords[tr_stopwords != "iyi"]

Sentiment Analysis

Erdoğan geçen başlıklardaki kelimeleri sayalım

erdogan_words <- sabah_stories_combined %>%
  filter(erdogan == 1) %>%
  select(title2) %>%
  unnest_tokens(word, title2) %>%
  filter(!word %in% c("son", "dakika", 
                      tr_stopwords, "recep", "tayyip", "daki̇ka")
         & !grepl("erdoğan", word)) %>%
  count(word, sort = TRUE)

Aynısını Kılıçdaroğlu için yapalım

kilicdaroglu_words <- sabah_stories_combined %>%
  filter(kilicdaroglu == 1) %>%
  select(title2) %>%
  unnest_tokens(word, title2) %>%
  filter(!word %in% c("son", "dakika", 
                      tr_stopwords, "kemal", "daki̇ka") &
           !grepl("kılıçdaroğlu", word)) %>%
  count(word, sort = TRUE)

İkisi için de en çok kullanılan 15 kelimeyi çıkaralım.

top_erdogan_words <- erdogan_words %>%
  slice_head(n = 15)
top_kilicdaroglu_words <- kilicdaroglu_words %>%
  slice_head(n = 15)
erdogan_plot <- ggplot(top_erdogan_words, 
                       aes(x = reorder(word, n), y = n)) +
  geom_bar(stat = "identity", fill = "orange") +
  labs(title = "Most Common Words in Erdogan Titles", 
       y = "Word", x = "Count") +
  theme_minimal() +
  theme(axis.text.y = element_text(angle = 0, hjust = 1)) +
  coord_flip()
ggsave("erdogan.png", dpi = 300)

kilicdaroglu_plot <- ggplot(top_kilicdaroglu_words, 
                            aes(x = reorder(word, n), y = n)) +
  geom_bar(stat = "identity", fill = "red") +
  labs(title = "Most Common Words in Kilicdaroglu Titles", 
       y = "Word", x = "Count") +
  theme_minimal() +
  theme(axis.text.y = element_text(angle = 0, hjust = 1)) +
  coord_flip()
ggsave("kilicdaroglu.png", dpi = 300)

Bunu world cloud olarak da yapabiliriz.

kilicdaroglu_wordcloud <- ggplot(top_kilicdaroglu_words, 
                                 aes(label = word, size = n)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 15) +
  theme_void() +
  labs(title = "Most Common Words in Kilicdaroglu Titles")
kilicdaroglu_wordcloud

erdogan_wordcloud <- ggplot(top_erdogan_words, 
                                 aes(label = word, size = n)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 30) +
  theme_void() +
  labs(title = "Most Common Words in Erdogan Titles")
erdogan_wordcloud

Sentiment Analysis

Text analizi kısmında textin kendisine ve içinde kelime olarak geçtiğine odaklanmıştık ancak içerik olarak nelerden bahsettiklerine hiç bakmadık.

sentiment <- read_xlsx("sentiment.xlsx")
## New names:
## • `` -> `...1`

Önce verimizi alıp analiz için hazırlayalım

dfm <- dfm(corpus(sentiment$fulltext) %>% 
             tokens(remove_punct = TRUE) %>%
             tokens_tolower() %>%
             tokens_remove(tr_stopwords) %>%
             tokens_wordstem(language = "tr")) 

Haberleri konulara bölelim ve beraber bulundukları haberlere göre ortlama olarak nasıl ayrıldıklarına bakalım.

set.seed(94)
lda_model <- textmodel_lda(dfm, k = 10)
top_terms <- terms(lda_model, n= 10)
top_terms_df <- data.frame(top_terms)
top_terms_df_1 <- top_terms_df[ , 1:5]
top_terms_df_2 <- top_terms_df[ , 6:10]
gt_table_1 <- gt::gt(top_terms_df_1) %>% 
  gt::tab_header(title = "Top Terms per Topic (1-5)")
gt_table_2 <- gt::gt(top_terms_df_2) %>% 
  gt::tab_header(title = "Top Terms per Topic (6-10)")
gt_table_1
Top Terms per Topic (1-5)
topic1 topic2 topic3 topic4 topic5
kılıçdaroğlu erdoğa em kemal erdoğa
genel millet erdoğa bay yıl
parti seç çocuk millet ülke
kemal genç dedi işçi millet
başka başka ev büyük
chp ak hesap başka son
kılıçdaroğlu'n ülke medya ifade başka
aday söz atık erdoğa ala
hdp an ada yer
i̇yi̇ parti erdoğan' diyecek dünya
gt_table_2
Top Terms per Topic (6-10)
topic6 topic7 topic8 topic9 topic10
başka ülke depre başka su
erdoğa türki erdoğa cumhurbaşkan açılış
recep türk vatandaş terör lira
tayyip dünya başka medya yol
cumhurbaşka uluslararas çalışma erdoğan' başka
genel konu bölge i̇sveç' yüz
ziyaret el konut karar proje
erdoğan' barış felaket örgüt tünel
görüşme halk afet eyle i̇stanbul
kabul ilişki yer i̇sveç yatır

Bu sefer 5 konu altınada toplarlayıp farkına bakalım. Ne değişti?

set.seed(94)
lda_model2 <- textmodel_lda(dfm, k = 5)
top_terms2 <- terms(lda_model2, n = 10)
top_terms2_df <- data.frame(top_terms2)
gt_table_3 <- gt::gt(top_terms2_df) %>% 
  gt::tab_header(title = "Top Terms per Topic")
gt_table_3
Top Terms per Topic
topic1 topic2 topic3 topic4 topic5
başka erdoğa erdoğa başka millet
parti depre yıl erdoğa erdoğa
genel başka başka cumhurbaşka ülke
kılıçdaroğlu vatandaş ülke tayyip genç
kemal bölge dünya recep söz
chp çalışma su türk siz
aday konut açılış erdoğan' an
kılıçdaroğlu'n felaket erdoğan' görüşme zama
cumhurbaşka devlet proje türki türki
6 afet tarih kabul dedi
topics <- topics(lda_model2)
sentiment$Topic <- topics
topic_labels <-  c("Erdogan","Turkey", "Kilicdaroglu", "Earthquake", "Election" )
sentiment$Topic <- factor(sentiment$Topic, labels = topic_labels)
topic_counts <- table(sentiment$Topic)
topic_counts <- as.data.frame(topic_counts)

Top Terms per Topic
topic1 topic2 topic3 topic4 topic5
an hdp konut emine hizmet
terör masa felaket türk i̇stanbul
i̇sveç akşener depremzede görüşme dünya
14 i̇yi̇ kahramanmaraş dünya alan
bay adaylık ev kabul proje
genç afet halk su
mayıs partisi hatay uluslararası kadın
siyasi koalisyon il mesaj sahip
masa isim yardım ilet hayat
örgüt görüşme bura atık yatırım

Konu başına haç haber düştüğünü bu şekilde de görebiliriz

Cluster Frequencies
Cluster Frequency
1 7
2 35
3 6
4 64
5 480

Kelimeleri pozitif ne negatif olarak bölelim. Bunu şimdi öğrenmeyeceğimiz lasso regresyonu diye bir yöntemle yapıyoruz ve böylece hangi kelimelerin neyi tahmin ettiğini görebiliyoruz. Bu kelimeler negatif veya pozitif olarak grupladığımız haberlerde en çok geçen kelimeler.

predictors <- convert(dfm, to = 'matrix')
response <- sentiment$erdogan
set.seed(94)
lasso <- cv.glmnet(predictors, response, family = "binomial", alpha = 1)
tmp_coeffs <- coef(lasso, s = "lambda.min")
coeffs_df <- data.frame(name = tmp_coeffs@Dimnames[[1]][tmp_coeffs@i + 1], coefficient = tmp_coeffs@x)
coeffs_df <- coeffs_df[coeffs_df$name != "(Intercept)",]
coeffs_df <- coeffs_df[order(-abs(coeffs_df$coefficient)),]
top_positive_words <- head(coeffs_df[coeffs_df$coefficient > 0,], 10)
top_negative_words <- head(coeffs_df[coeffs_df$coefficient < 0,], 10)

Negatif haberlerde en çok geçen kelimeler

Top Positive Words
Word Coefficient
ereğli' 7.2696803
beledi̇yeni̇n 2.5043161
başvura 2.4157332
tazmina 2.3183030
toplanmış 1.7814972
daires 1.0560055
yama 0.9433218
tayyip 0.6628557
etkileşim 0.6190165
onardi 0.4788912

Pozitif haberlerde en çok geçen kelimeler

top_negative_words <- gt::gt(top_negative_words) %>%
  gt::tab_header(title = "Top Negative Words") %>%
  gt::cols_label(name = "Word", coefficient = "Coefficient")
top_negative_words
Top Negative Words
Word Coefficient
kılıçdaroğlu’n -4.749593
sevigen' -4.154531
ağırlayarak -3.702199
flört -3.103029
pervasız -2.786683
kılıçdaroğlu’na -2.731575
tüm -2.563847
mensubiyet -2.469966
çürüdü -2.436991
açiklama -2.348500

İsimleri çıkarıp bir daha bakalım

coeffs_df <- coeffs_df[-grep('erdoğa|kılıçdaroğlu|tayyip|recep|kemal', coeffs_df$name), ]
coeffs_df <- coeffs_df[order(-abs(coeffs_df$coefficient)),]
top_words <- head(coeffs_df, 10)

Bunu yaparken dikkat edilmesi gereken noktayı hatırlatmak gerekirse: robots.txt dışına çıkmak etik olarak görülmez ve zaten bir çok sitede buna yönelik engeller vardır. Sürekli veri çekmeye çalışırsanız da sizi IP üstünden geçici veya kalıcı olarak engelleyebilir. Bu da ileride herhangi bir çalışma yapmamızın önüne geçer.

Bazı sitelerse doğrudan scrape edilmeyi istemezler ancak application programming interface (API) aracılığıyla buna izin verirler. Bunlar ücretli veya ücretsiz olabilir ancak düzenli bir veri çekimine olanak tanırlar.

Daha advanced sentiment analiz icin R’in paketleri var ancak bunlar Ingilizce ile calisiyor. Turkce yapmak icinse Python kullanmamiz lazim. Neyse ki R kullanirken Python cagirip isimizi yapitrabiliyoruz.

Once Python islemlerini yapalim

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
import pandas as pd
from openpyxl import Workbook
model = AutoModelForSequenceClassification.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
sa= pipeline("sentiment-analysis", tokenizer=tokenizer, model=model)
## WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
##     PyTorch 1.13.1 with CUDA None (you have 2.3.0)
##     Python  3.9.12 (you have 3.9.12)
##   Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
##   Memory-efficient attention, SwiGLU, sparse and more won't be available.
##   Set XFORMERS_MORE_DETAILS=1 for more details
input_file = "/Users/alionurgitmez/Desktop/R Dersi/Hafta 15/sabah_PS2.xlsx"
output_file = "after_analysis.xlsx"
df = pd.read_excel(input_file)
print(df.head())
##    Unnamed: 0  ...                                lemmatized_fulltext
## 0           0  ...  Başkan Erdoğan dan Ali İhsan Destici taziye me...
## 1           1  ...  Emine Erdoğan vatandaş böyle seslen Duyarlı he...
## 2           2  ...  Başkan Erdoğan Yeni Azerbaycan Partisi Gençler...
## 3           3  ...  Son dakika Denizli emekçi kadın buluş Başkan E...
## 4           4  ...  O fotoğraf kahraman konuş Huriye teyze Erdoğan...
## 
## [5 rows x 8 columns]

Burada sentiment paketini calistiyoruz

sentiment_results = df['title'].apply(lambda x: sa(x)[0])
df['sentiment_label'] = [result['label'] for result in sentiment_results]
df['sentiment_score'] = [result['score'] for result in sentiment_results]
df.to_excel(output_file, index=False)

Burada kaydettigimiz veriyi R’a aliyoruz. Boylece R’a geri donmus olduk. Ikisi de ayni anda calisabiliyor.

## [1] "Average positive sentiment score of Kilicdaroglu articles is: 0.85"
## [1] "Average positive sentiment score of Erdogan articles is: 0.91"

Gorsellestirmesini de yapalim

sabah_stories_positive$date <- as.Date(sabah_stories_positive$date)

average_sentiment2 <- sabah_stories_positive %>%
  mutate(week = as.integer(format(date, "%U"))) %>%
  group_by(erdogan, week) %>%
  summarize(average_sentiment = mean(sentiment_score))

ggplot(average_sentiment2, aes(x = week, y = average_sentiment, color = factor(erdogan))) +
  geom_line() +
  labs(x = "Week", y = "Average Sentiment", color = "Leader") +
  scale_color_manual(values = c("red", "orange"), labels = c("Kilicdaroglu", "Erdogan")) +
  scale_x_continuous(breaks = unique(average_sentiment2$week)) +
  theme_minimal()

Spatial Analysis

Coğrafi analiz özellikle IR alanında ve conflict çalışmalarında gelişmekte olan bir analiz yöntemi. Siyasi ve fiziksel harita veya uydu bilgilerini kullanarak çıkarımlar yapmamıza olanak sağlıyor. Text analizi gibi çok geniş bir alan olduğu için kısa bir giriş yapacağım. Öncelikle bu alanda zorunlu olan paketleri yükleyelim.

library(sf)
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(raster)
## Loading required package: sp
## 
## Attaching package: 'raster'
## The following object is masked from 'package:dplyr':
## 
##     select
library(spData)
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
library(viridis)
## Loading required package: viridisLite
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:viridis':
## 
##     viridis_pal
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor

Örnek bir veriyi hazırlayıp bakalım. Bu veri North Carolina sudden infant death syndrome (SIDS) oranlarına bakıyor. Aynı zamanda bu veriyi görselleştirelim. st olarak belirttiğimiz için buna göre görselleştirme yaptı.

nc <- st_read(system.file("shapes/sids.shp", package = "spData"))
## Reading layer `sids' from data source 
##   `/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/spData/shapes/sids.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 22 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## CRS:           NA
plot(nc["SID74"]) 

Her bölge çevresine 10 km bir güvenli alan oluşturalım.

buffered_nc <- st_buffer(nc, dist = 10000)
plot(buffered_nc, max.plot = 22)

Kesişen alanları bulalım. Bu ilk verimiz ve 10 km koyduğumuz veri arasında kesişen alanları gösteriyor.

intersection_nc <- st_intersection(nc, buffered_nc)
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
plot(intersection_nc)
## Warning: plotting the first 10 out of 44 attributes; use max.plot = 44 to plot
## all

Farklı bir görselleştirme alternatifi olarak da bunu kullanabiliriz.

library(tmap)
## Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
## remotes::install_github('r-tmap/tmap')
tm_shape(nc) +
  tm_polygons(col = "SID74", palette = "Blues", title = "SIDS Rate 1974") +
  tm_layout(legend.outside = TRUE)
## Warning: Currect projection of shape nc unknown. Long-lat (WGS84) is assumed.

Haritalar ile çalışmanın farklı bir örneğini de Türkiye seçimleri üstünden görebiliriz. Bunu ilçe bazlı ve il bazlı seçim sonuçları üstünden yapabiliriz.

library(haven)
library(tidyverse)
library(TRmaps)
ilce_election <- read_dta("Elections_wide.dta")
ilce_election <- ilce_election %>% dplyr::select(adm1_tr, adm2_tr, akp_2018P, chp_2018P, mhp_2018P, 
                                          iyi_2018P, hdp_2018P, sp_2018P)

ilce_election$adm2_tr <- gsub(".*MERKEZ*", "MERKEZ", ilce_election$adm2_tr)

convert_to_lower <- function(str) {
  converted_str <- str_to_sentence(str, locale = "tr")  
  return(converted_str)
}
ilce_election$adm1_tr <- convert_to_lower(ilce_election$adm1_tr)
## Warning in stri_trans_totitle(string, opts_brkiter = stri_opts_brkiter(type =
## "sentence", : A resource bundle lookup returned a result either from the root
## or the default locale.
ilce_election$adm2_tr <- convert_to_lower(ilce_election$adm2_tr)
## Warning in stri_trans_totitle(string, opts_brkiter = stri_opts_brkiter(type =
## "sentence", : A resource bundle lookup returned a result either from the root
## or the default locale.
ilce_election$il_ilce <- paste(ilce_election$adm1_tr, ilce_election$adm2_tr, sep = "-")

ilce_election <- ilce_election %>% dplyr::select(-adm1_tr, -adm2_tr)

tr_ilce <- as.data.frame(st_as_sf(tr_ilce))

tr_ilce <- tr_ilce %>% dplyr::select(il_ilce, geometry, tuik_no, Shape_Area, Shape_Leng)

tr_ilce <- tr_ilce[tr_ilce$il_ilce != "Hakkari-Derecik", ]

ilce_election <- ilce_election %>%
  mutate(il_ilce = ifelse(il_ilce == "Ankara-Kazan", "Ankara-Kahramankazan", il_ilce))
ilce_election <- ilce_election %>%
  mutate(il_ilce = ifelse(il_ilce == "Samsun-19 mayıs", "Samsun-19 Mayıs", il_ilce))
ilce_election <- ilce_election %>%
  mutate(il_ilce = ifelse(il_ilce == "Kırıkkale-Bahşili", "Kırıkkale-Bahşılı", il_ilce))

final_ilce_election <- merge(ilce_election, tr_ilce, by.x = "il_ilce", by.y = "il_ilce")
final_ilce_election$cumhur <- final_ilce_election$akp_2018P + final_ilce_election$mhp_2018P
final_ilce_election$millet <- final_ilce_election$chp_2018P + final_ilce_election$iyi_2018P + final_ilce_election$sp_2018P
final_ilce_election$hdp <- final_ilce_election$hdp_2018P 


final_ilce_election$kazanan <- ifelse(final_ilce_election$hdp > final_ilce_election$millet & final_ilce_election$hdp > final_ilce_election$cumhur, "HDP",
                                      ifelse(final_ilce_election$millet > final_ilce_election$hdp & final_ilce_election$millet > final_ilce_election$cumhur, "MILLET",
                                             ifelse(final_ilce_election$cumhur > final_ilce_election$hdp & final_ilce_election$cumhur > final_ilce_election$millet, "CUMHUR", "ESIT")))
final_ilce_election_sf <- st_as_sf(final_ilce_election)
my_colors <- c("HDP" = "#8f1ca5", "MILLET" = "red", "CUMHUR" = "orange", "CHP" = "firebrick1", 
               "MHP" = "darkred", "IYI" = "cyan", "SAADET" = "maroon", "ESIT" = "darkgrey")

ggplot(final_ilce_election_sf) + 
  geom_sf(aes(fill = kazanan)) +
  scale_fill_manual(values = my_colors) +
  labs(fill = "Kazanan") + 
  theme_void() +
  ggtitle("Tum Ittifaklar Beraber") 

Spatial data ile bir analiz öğreneğini inceleyelim. Öncelikle kullanılacak verileri alıp dönüşüm yapıyoruz.

url <- "https://www2.census.gov/geo/tiger/TIGER2019/COUNTY/tl_2019_us_county.zip"
download.file(url, destfile = "us_counties.zip")
unzip("us_counties.zip")
counties <- st_read('/Users/alionurgitmez/Desktop/R Dersi/Hafta 15/tl_2019_us_county.shp')
## Reading layer `tl_2019_us_county' from data source 
##   `/Users/alionurgitmez/Desktop/R Dersi/Hafta 15/tl_2019_us_county.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 3233 features and 17 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.43979
## Geodetic CRS:  NAD83
counties_transformed <- st_transform(counties, 2163)
covid_url <- "https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv"
covid_data <- read.csv(covid_url)
write_csv(covid_data, "covid_data.csv")
covid_data <- read_csv("covid_data.csv")
## Rows: 2502832 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): county, state
## dbl  (3): fips, cases, deaths
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
covid_latest <- covid_data %>%
  group_by(fips) %>%
  filter(date == max(date)) %>%
  ungroup() %>%
  dplyr::select(fips, cases, deaths)
covid_latest$fips <- as.character(covid_latest$fips)
counties_covid <- counties_transformed %>%
  left_join(covid_latest, by = c("GEOID" = "fips"))
counties_covid <- counties_covid %>%
  mutate(cases_per_100k = (cases / as.numeric(ALAND)) * 100000)

Vakaların çok yüksek olduğu yerlere bakalım

hotspots <- counties_covid %>%
  arrange(desc(cases_per_100k)) %>%
  slice_head(n = 10)
print(hotspots[, c("NAME", "STATEFP", "cases", "cases_per_100k")])
## Simple feature collection with 10 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 1930000 ymin: -428834.5 xmax: 2352825 ymax: 140953.9
## Projected CRS: NAD27 / US National Atlas Equal Area
##                    NAME STATEFP  cases cases_per_100k
## 1               Suffolk      25 226017      149.81600
## 2                Hudson      34 179135      149.72731
## 3          Philadelphia      42 317808       91.38143
## 4  District of Columbia      11 143943       90.90732
## 5            Alexandria      51  32647       84.40414
## 6             Arlington      51  46198       68.61124
## 7                 Essex      34 222558       68.15279
## 8         Manassas Park      51   3828       58.54358
## 9                Nassau      36 425386       57.70325
## 10                Union      34 151057       56.75346
##                          geometry
## 1  MULTIPOLYGON (((2322932 130...
## 2  MULTIPOLYGON (((2144411 -12...
## 3  MULTIPOLYGON (((2074643 -24...
## 4  MULTIPOLYGON (((1955476 -40...
## 5  MULTIPOLYGON (((1956985 -41...
## 6  MULTIPOLYGON (((1952303 -40...
## 7  MULTIPOLYGON (((2122970 -12...
## 8  MULTIPOLYGON (((1930000 -42...
## 9  MULTIPOLYGON (((2171046 -10...
## 10 MULTIPOLYGON (((2119662 -14...

Yakındaki yerler üstünden vakaların yüksek olduğu yerleri belirleyelim

high_case_counties <- counties_covid %>%
  filter(cases_per_100k > quantile(cases_per_100k, 0.9, na.rm = TRUE))

neighbors <- st_join(counties_covid, high_case_counties, join = st_touches, left = FALSE)

clusters <- neighbors %>%
  group_by(GEOID.x) %>%
  summarise(cluster_size = n())

Şimdiyse elimizdeki verileri görselleştirelim:

library(viridis)
library(scales)
ggplot(data = counties_covid) +
  geom_sf(aes(fill = cases_per_100k), color = NA) +
  scale_fill_viridis(trans = "log", labels = comma, 
                     name = "Cases per 100,000\n(log scale)") +
  theme_minimal() +
  theme(legend.position = "right") +
  labs(title = "COVID-19 Cases per 100,000 Population by US County",
       subtitle = "Data as of latest available date",
       caption = "Source: New York Times COVID-19 Data")

Bu tip spatial veriye vektör verisi diyorduk. Başka bir tip spatial verisi ise raster dediğimiz ve fiziki haritalarda da görebileceğimiz harita tipi.

library(raster)
library(terra)
## terra 1.7.78
## 
## Attaching package: 'terra'
## The following object is masked from 'package:scales':
## 
##     rescale
## The following object is masked from 'package:quanteda':
## 
##     meta
## The following object is masked from 'package:tidyr':
## 
##     extract
elevation <- raster("raster.tif")
print(elevation)
## class      : RasterLayer 
## dimensions : 10812, 10812, 116899344  (nrow, ncol, ncell)
## resolution : 9.259259e-05, 9.259259e-05  (x, y)
## extent     : -106.0006, -104.9994, 38.99944, 40.00056  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=NAD83 +no_defs 
## source     : raster.tif 
## names      : raster
summary(elevation)
## Warning in .local(object, ...): summary is an estimate based on a sample of 1e+05 cells (0.09% of all cells)
##           raster
## Min.    1555.488
## 1st Qu. 2363.127
## Median  2744.982
## 3rd Qu. 3063.692
## Max.    4317.352
## NA's       0.000
plot(elevation, main = "Elevation in Colorado", xlab = "Longitude", ylab = "Latitude")

Buradan fiziki bir analiz de yapmamız mümkün. Mesela birazdan bahseceğim verideki bigi yükseklik yerine eğim kullanarak analiz yapabiliriz.

slope <- terrain(elevation, opt = "slope", unit = "degrees")
aspect <- terrain(elevation, opt = "aspect", unit = "degrees")
par(mfrow = c(1, 2))
plot(slope, main = "Slope in Colorado", xlab = "Longitude", ylab = "Latitude")
plot(aspect, main = "Aspect in Colorado", xlab = "Longitude", ylab = "Latitude")

Terror example

library(rnaturalearth)
library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:viridis':
## 
##     unemp
## The following object is masked from 'package:purrr':
## 
##     map
library(rnaturalearthdata)
## 
## Attaching package: 'rnaturalearthdata'
## The following object is masked from 'package:rnaturalearth':
## 
##     countries110
library(TRmaps)
library(Hmisc)
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:terra':
## 
##     describe, mask, units, zoom
## The following objects are masked from 'package:raster':
## 
##     mask, zoom
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## The following objects are masked from 'package:base':
## 
##     format.pval, units
library(gt)
## 
## Attaching package: 'gt'
## The following object is masked from 'package:Hmisc':
## 
##     html
library(tidyverse)
load("pkkattacks.rdata")
pkkattacks <- dataNewC %>% dplyr::select(attack_p22, nkill_n_p22, nwound_n_p22, margin,
                                   unemployment, literacy, infant_mort_perthousand, attack_p1,
                                   subsidy, curfew, border, district, X_coordinate, Y_coordinate, population, district)
pkkattacks$curfew[pkkattacks$curfew == "TRUE"] <- 1
pkkattacks$border[pkkattacks$border == "TRUE"] <- 1

Yükseklik verisini çekelim. Uzun süreceği için ben öncede çekmiştim. Onu kulalanacağım.

elevation_url <- "https://maps.googleapis.com/maps/api/elevation/json"
geocoding_url <- "https://maps.googleapis.com/maps/api/geocode/json"
api_key <- "AIzaSyDxC-P1tX7vAZ6JVnevvflwZ0YmH6sQn4"
elevation_data <- c()
location_names <- c()
for (i in 1:nrow(pkkattacks)) {
  
  
  lat <- dataNewC[i, "Y_coordinate"]
  lng <- dataNewC[i, "X_coordinate"]
  
  
  elevation_params <- list(
    locations = paste(lat, lng, sep = ","),
    key = api_key
  )
  
  
  elevation_response <- GET(elevation_url, query = elevation_params)
  
  
  elevation <- content(elevation_response)$results[[1]]$elevation
  
  
  geocoding_params <- list(
    latlng = paste(lat, lng, sep = ","),
    key = api_key
  )
  
  
  geocoding_response <- GET(geocoding_url, query = geocoding_params)
  
  
  location_name <- content(geocoding_response)$results[[1]]$formatted_address
  
  
  elevation_data <- c(elevation_data, elevation)
  location_names <- c(location_names, location_name)
}
library(readxl)
pkkattacks_yeni <- read_xlsx("pkkattacks.xlsx")
## New names:
## • `district_id` -> `district_id...2`
## • `province.x` -> `province.x...3`
## • `province.x` -> `province.x...11`
## • `province.y` -> `province.y...44`
## • `province.y` -> `province.y...48`
## • `district_id` -> `district_id...56`
## • `province.x2` -> `province.x2...57`
## • `province.x2` -> `province.x2...66`
## • `province.y2` -> `province.y2...99`
## • `province.y2` -> `province.y2...103`
pkkattacks_yeni <- pkkattacks_yeni %>% dplyr::select(attack_p22, nkill_n_p22, nwound_n_p22, margin, unemployment, literacy, infant_mort_perthousand, attack_p1, subsidy, curfew, border, district, X_coordinate, Y_coordinate, population, district, elevation, location_name)

Sınıra olan uzaklıkları hesaplayalım

countries <- ne_countries(returnclass = "sf")
syria_border <- subset(countries, name == "Syria")
iraq_border <- subset(countries, name == "Iraq")
iran_border <- subset(countries, name == "Iran")
cities_sf <- st_as_sf(pkkattacks_yeni, coords = c("X_coordinate", "Y_coordinate"), crs = 4326)
pkkattacks_yeni$syria <- st_distance(cities_sf, syria_border)
pkkattacks_yeni$iran <- st_distance(cities_sf, iran_border)
pkkattacks_yeni$iraq <- st_distance(cities_sf, iraq_border)
pkkattacks_yeni <- pkkattacks_yeni %>%
  mutate(syria = as.numeric(str_remove(syria, "\\[m\\]")) / 1000)
pkkattacks_yeni <- pkkattacks_yeni %>%
  mutate(iran = as.numeric(str_remove(iran, "\\[m\\]")) / 1000)
pkkattacks_yeni <- pkkattacks_yeni %>%
  mutate(iraq = as.numeric(str_remove(iraq, "\\[m\\]")) / 1000)
pkkattacks_yeni <- pkkattacks_yeni %>%
  mutate(border_distance = pmin(syria, iraq, iran, na.rm = TRUE))

Harita yapalım. İlk olarak yükseklik haritası olsun.

map <- pkkattacks_yeni %>% dplyr::select(district, X_coordinate, Y_coordinate, elevation, population, attack_p22)

pkkattacks_sf <- st_as_sf(map, coords = c("X_coordinate", "Y_coordinate"), crs = 4326)

turkey_map <- ne_states(country = "Turkey", returnclass = "sf")

color_scale <- terrain.colors(n = 7)
color_scale[7] <- rgb(102, 51, 0, maxColorValue = 255)
districtmap <- ggplot() +
  geom_sf(data = turkey_map, fill = "transparent", color = "black") +
  geom_sf(data = pkkattacks_sf, aes(color = elevation, size = population)) +
  scale_color_gradientn(colors = color_scale, name = "Elevation") +
  labs(title = "Map of Districts", color = "Elevation", size = "Population") +
  guides(size = "none") +
  theme_bw()
districtmap

Saldırıların haritası

attackmap <- ggplot() +
  geom_sf(data = turkey_map, fill = "transparent", color = "black") +
  geom_sf(data = pkkattacks_sf[pkkattacks_sf$attack_p22 != 0,], aes(color = elevation, size = attack_p22)) +
  scale_color_gradientn(colors = color_scale, name = "Elevation") +
  labs(title = "Map of Attacks", color = "Elevation", size = "Attacks") +
  guides(size = FALSE) +
  theme_bw()
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
attackmap

Daha detaylı bir harita yapalım

tr_ilce_sf <- st_as_sf(tr_ilce)
attackmap2 <- ggplot() + 
  geom_sf(data = tr_ilce_sf) +
  geom_sf(data = turkey_map, fill = "transparent", color = "black") +
  geom_sf(data = pkkattacks_sf[pkkattacks_sf$attack_p22 != 0,], aes(color = elevation, size = attack_p22)) +
  scale_color_gradientn(colors = color_scale, name = "Elevation") +
  labs(title = "Map of Attacks", color = "Elevation", size = "Attacks") +
  guides(size = "none") +
  theme_bw()
attackmap2

Network Analysis

Burada bahsettiğim diğer alanlar gibi Network de çok hızlıca gelişen bir alan. Ancak çok fazla bilimsel derinliği bir alan olduğu için çok detaylıca bahsetmem mümkün değil. Ondan dolayı temel analiz kısmına odaklanıp, görselleştirme ve basit kavramları anlatacağım. Zaten başlıbaşına programı olan bir alan.

library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:terra':
## 
##     blocks, compare, union
## The following object is masked from 'package:raster':
## 
##     union
## The following object is masked from 'package:seededlda':
## 
##     sizes
## The following objects are masked from 'package:lubridate':
## 
##     %--%, union
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## The following object is masked from 'package:tidyr':
## 
##     crossing
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(igraphdata)
data(karate)

Network alanının çok temel kavramları var:

  • Nodes(Veritces): Network’ün unitleridir.
  • Edges(Links): Nodelar arasındaki ilişkiyi gösterir.

Farklı network tipleri: - Directed vs undirected: İki node arasındaki edge’in yani ilişkinin bir yönü olup olmadığını gösterir. - Weighted vs unweighted: İki node arasındaki ilişkinin gücünü gösterir.

Basit network

g <- graph(edges = c(1, 2, 1, 3, 2, 4, 3, 4, 4, 5), directed = FALSE)
plot(g, vertex.label = V(g)$name, edge.arrow.size = 0.5)

Direted network

g_directed <- graph(edges = c(1, 2, 1, 3, 2, 4, 3, 4, 4, 5), directed = TRUE)
plot(g_directed, vertex.label = V(g_directed)$name, edge.arrow.size = 0.5)

Weighted network

g_weighted <- graph(edges = c(1, 2, 1, 3, 2, 4, 3, 4, 4, 5), directed = FALSE)
E(g_weighted)$weight <- c(1, 2, 1.5, 2.5, 3)
plot(g_weighted, vertex.label = V(g_weighted)$name, edge.width = E(g_weighted)$weight)

Sized network

g_sized <- graph(edges = c(1, 2, 1, 3, 2, 4, 3, 4, 4, 5), directed = FALSE)
vertex_degree <- degree(g_sized)
plot(g_sized, vertex.label = V(g_sized)$name, vertex.size = vertex_degree * 10, edge.arrow.size = 0.5)

Adjacency Matrix ile nodeların diğer nodlara olan ilişkisine bakabiliriz.

as_adjacency_matrix(g)
## 5 x 5 sparse Matrix of class "dgCMatrix"
##               
## [1,] . 1 1 . .
## [2,] 1 . . 1 .
## [3,] 1 . . 1 .
## [4,] . 1 1 . 1
## [5,] . . . 1 .

Dataframe ile de network grafik oluşturabiliriz

edges <- data.frame(from = c("A", "A", "B", "C", "D"),
                    to = c("B", "C", "D", "E", "E"))
g_df <- graph_from_data_frame(edges, directed = TRUE)
plot(g_df)

plot(g, vertex.label = V(karate)$name, edge.arrow.size = 0.5)
## This graph was created by an old(er) igraph version.
##   Call upgrade_graph() on it to use with the current igraph version
##   For now we convert it on the fly...

Önemli bazı metrikleri çıkartma. Centrality her node’un diğer kaç node’a bağlı olduğunu gösterir. Böylece bir node’un ne kadar önemli olup olmadığını görebiliriz.

degree(g)
## [1] 2 2 2 3 1

Betweenness ise bir node’un diğer iki nod arasında kaç defa en kısa yolu sağladığını gösterir.

betweenness(g)
## [1] 0.5 1.0 1.0 3.5 0.0

Closeness ise bir node’un diğer nodelar ile ne kadar yakın olduğuna bakar. Böylece diğer nodelar ile ne kadar interact ettiğini görebiliriz. Ortalama bir değer olarak alınır. Yüksek değer daha merkezi anlamına gelir.

closeness(g)
## [1] 0.1428571 0.1666667 0.1666667 0.2000000 0.1250000

Diameter ise iki netwoek arasındaki maksiumum uzaklığı hesaplar. Kaç edge geçerek bu değere ulaşacağımızı gösterir.

diameter(g)
## [1] 3

Ortalama mesafe ise nodelar arasındaki ortalama edge sayısına bakar. Yine merkezi bir node olduğuna bakmamız açısından önemlidir.

mean_distance(g, directed = FALSE)
## [1] 1.6

Daha gelişmiş bir grafik oluşturması için:

library(tidygraph)
## 
## Attaching package: 'tidygraph'
## The following object is masked from 'package:igraph':
## 
##     groups
## The following object is masked from 'package:raster':
## 
##     select
## The following object is masked from 'package:quanteda':
## 
##     convert
## The following object is masked from 'package:stats':
## 
##     filter
library(ggraph)
## 
## Attaching package: 'ggraph'
## The following object is masked from 'package:sp':
## 
##     geometry
V(g)$name <- c("A", "B", "C", "D", "E")
g_tbl <- as_tbl_graph(g)
ggraph(g_tbl, layout = 'fr') +
  geom_edge_link(aes(edge_alpha = 0.5)) +
  geom_node_point(aes(size = degree(g_tbl)), color = 'skyblue') +
  geom_node_text(aes(label = name), repel = TRUE) +
  theme_void()

Cliqueler nodeların bir alt grubudur. Nodeların diğer nodelara doğrudan bağlı olduğu gruplara denir.

cliques_found <- cliques(g)
print(cliques_found)
## [[1]]
## + 1/5 vertex, named, from 18fb75a:
## [1] D
## 
## [[2]]
## + 1/5 vertex, named, from 18fb75a:
## [1] A
## 
## [[3]]
## + 1/5 vertex, named, from 18fb75a:
## [1] E
## 
## [[4]]
## + 2/5 vertices, named, from 18fb75a:
## [1] D E
## 
## [[5]]
## + 1/5 vertex, named, from 18fb75a:
## [1] C
## 
## [[6]]
## + 2/5 vertices, named, from 18fb75a:
## [1] A C
## 
## [[7]]
## + 2/5 vertices, named, from 18fb75a:
## [1] C D
## 
## [[8]]
## + 1/5 vertex, named, from 18fb75a:
## [1] B
## 
## [[9]]
## + 2/5 vertices, named, from 18fb75a:
## [1] A B
## 
## [[10]]
## + 2/5 vertices, named, from 18fb75a:
## [1] B D

Louvalin algoritması kullanarak community yani birbiri ile yakından ilişkili olan grupları görebiliriz. Bunu özellikle ilişkiler içinde grup bulmak istiyorsak kullanırız. Cliquelerin akine kendi içinde güçlü bağlantıya ama aralarında zayıf bağlantıya sahip olamalarıdır.

community <- cluster_louvain(g)
plot(community, g)

Edge betweenness ile de birbirine bağlı olan nodeları gruplandırabiliriz. Bu özellikle büyük verisetlerinde işe yarar.

eb_communities <- cluster_edge_betweenness(g)
plot(eb_communities, g)

Yeni node ve edge ekleme

g <- add_vertices(g, 1, name = "New Node")
g <- add_edges(g, c("New Node", "A"))
plot(g, vertex.label = V(g)$name, edge.arrow.size = 0.5)

Edge çıkartma

g <- delete_edges(g, E(g, P = c("New Node", "A")))
plot(g, vertex.label = V(g)$name, edge.arrow.size = 0.5)

Subset etme

sub_g <- induced_subgraph(g, vids = V(g)[degree(g) > 1])
plot(sub_g)

Communityleri daha büyük bir veristeinde de görüntüleyelim

community <- cluster_edge_betweenness(karate)
## Warning in cluster_edge_betweenness(karate): At
## vendor/cigraph/src/community/edge_betweenness.c:498 : Membership vector will be
## selected based on the highest modularity score.
membership(community)
##    Mr Hi  Actor 2  Actor 3  Actor 4  Actor 5  Actor 6  Actor 7  Actor 8 
##        1        1        2        1        3        3        3        1 
##  Actor 9 Actor 10 Actor 11 Actor 12 Actor 13 Actor 14 Actor 15 Actor 16 
##        4        2        3        1        1        2        4        4 
## Actor 17 Actor 18 Actor 19 Actor 20 Actor 21 Actor 22 Actor 23 Actor 24 
##        3        1        4        1        4        1        4        5 
## Actor 25 Actor 26 Actor 27 Actor 28 Actor 29 Actor 30 Actor 31 Actor 32 
##        5        5        6        5        2        6        4        4 
## Actor 33   John A 
##        4        4
plot(community, karate, vertex.label = V(karate)$name, main = "Communities in Karate Network")

Modularity böyle daha büyük verilerde verinin communitylere bölünme derecesini inceler.

modularity(community)
## [1] 0.345299

Clustering coefficent network içindeki nodeların ne derece cluster olduğuna bakar

transitivity(karate, type = "global")
## [1] 0.2556818

Bunu her nod için de yapabiliriz

transitivity(karate, type = "local")
##     Mr Hi   Actor 2   Actor 3   Actor 4   Actor 5   Actor 6   Actor 7   Actor 8 
## 0.1500000 0.3333333 0.2444444 0.6666667 0.6666667 0.5000000 0.5000000 1.0000000 
##   Actor 9  Actor 10  Actor 11  Actor 12  Actor 13  Actor 14  Actor 15  Actor 16 
## 0.5000000 0.0000000 0.6666667       NaN 1.0000000 0.6000000 1.0000000 1.0000000 
##  Actor 17  Actor 18  Actor 19  Actor 20  Actor 21  Actor 22  Actor 23  Actor 24 
## 1.0000000 1.0000000 1.0000000 0.3333333 1.0000000 1.0000000 1.0000000 0.4000000 
##  Actor 25  Actor 26  Actor 27  Actor 28  Actor 29  Actor 30  Actor 31  Actor 32 
## 0.3333333 0.3333333 1.0000000 0.1666667 0.3333333 0.6666667 0.5000000 0.2000000 
##  Actor 33    John A 
## 0.1969697 0.1102941

Networkün yoğunluğuna da bakabiliriz. Bu ise potansiyel bağlantıların ne kadarının gerçekten bağlantı oluşturduğuna bakar. Böylece networkün potansiyel ilişkisine ne kadar ulaştığına bakabiliriz.

edge_density(karate)
## [1] 0.1390374

Eigenvector ise bütün nodelara bir değer atar. Bu atamayı da daha merkezde olan veya daha çok bağlantısı bulnan nodelara daha yüksek değer atayarak yapar.

eigenvector_centrality <- evcent(g)$vector
## Warning: `evcent()` was deprecated in igraph 2.0.0.
## ℹ Please use `eigen_centrality()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
print(eigenvector_centrality)
##         A         B         C         D         E  New Node 
## 0.7807764 0.8337830 0.8337830 1.0000000 0.4682132 0.0000000

Alternatif olarak başka paketler kullanarak da logo oluşturabiliriz.

library(tidygraph)
library(ggraph)
network_data <- tibble::tribble(
  ~from, ~to,
  "A", "B",
  "B", "C",
  "C", "D",
  "D", "A"
)

g_tbl <- as_tbl_graph(network_data)

ggraph(g_tbl, layout = "fr") + 
  geom_edge_link() + 
  geom_node_point() + 
  geom_node_text(aes(label = name), vjust = 1.5) +
  theme_void()

Bunu kendi kullandığım bir veri üstünden anlatayım. Bu veri Twitter ile çalıştığım bir veri. Ülkelerin arasındaki troll saldırılarını, övme veya eleştirmeyi gösteriyor ve kaç defa bu derecede bir saldırı olduğunu belirtiyor. Böylece hem edge width, hem direction hem onun tipi hem de node boyutu ile ilgili analiz yapabiliriz. Şimdi veriyi alıp analiz edelim.

library(readxl)
library(igraph)
library(ggraph)
library(tidyverse)
edges <- read_xlsx("edges.xlsx")
nodes <- read_xlsx("nodes.xlsx")
edges <- edges %>%
  mutate(attack_type = case_when(
    attack_type == "P" ~ "green",
    attack_type == "N" ~ "red"
  ))
used_nodes <- unique(c(edges$source, edges$target))
filtered_nodes <- nodes %>% filter(country %in% used_nodes)
graph <- graph_from_data_frame(d = edges, vertices = filtered_nodes, directed = TRUE)
V(graph)$country <- filtered_nodes$country
ggraph(graph, layout = "fr") + 
  geom_edge_link(aes(color = attack_type, width = attack_density), 
                 arrow = arrow(length = unit(3, "mm"), type = "open"), 
                 end_cap = circle(3, 'mm'), 
                 edge_alpha = 0.7, 
                 lineend = "round") + 
  geom_edge_loop(aes(color = attack_type, width = attack_density), 
                 arrow = arrow(length = unit(3, "mm"), type = "open"), 
                 end_cap = circle(3, 'mm'), 
                 edge_alpha = 0.7, 
                 lineend = "round") + 
  geom_node_point(aes(size = node_size), color = "blue", fill = "lightblue", shape = 21, stroke = 1) + 
  geom_node_text(aes(label = V(graph)$country), repel = TRUE, size = 5, color = "black") + 
  scale_size_continuous(range = c(4, 12)) + 
  scale_edge_width_continuous(range = c(0.5, 2)) + 
  scale_edge_color_manual(values = c("green" = "green", "red" = "red")) + 
  theme_void() + 
  theme(legend.position = "none")

Machine Learning

Bilgisayarlara veriden öğrenmeyi ve böylece gelecekte benzer bir durumla karşılaşılmasıdurumunda neler olabileceğini öğretmemize ve bunları tahmin etmemize yarayan bir yöntemdir. Temel olarak 2 tip vardır.

  • Supervised: Label ettiğimiz bir veriyi kullanarak, parametreleri bilgisayara vererek öğrettiğimiz modeldir.

  • Unsupervised: Label olamadan verileri gruplama üstüne kuruludur. Bir değişken belirtmeden ortaklık aranarak verileri gruplar ve bu şekilde ayrımı sağlar.

Bu alanın çok klasik bir verisi titanic verisidir. Titanic yolcularının belli parametrelere göre hayatta kalıp kalmadığını tahmin etmeki için kullanılır. Bu örnek supervised learninge bir örnek olacak. Önemli olan bazı değişkenler:

  • Survived: Yolcunun hayatta kalıp kalmadığına dair veri. 0 ve 1 değeri alır.
  • Pclass: Yolcunun hangi sınıfta olduğunu gösterir. 1, 2, 3 değerlerini alır.
  • Sex: Yolcunun cinsiyeti
  • Age: Yolcunun yaşı
  • Fare: Yolcunun bilete ödedeiği ücret
library(titanic)
library(caret)
library(randomForest)
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following object is masked from 'package:ggplot2':
## 
##     margin
library(e1071)
## 
## Attaching package: 'e1071'
## The following object is masked from 'package:Hmisc':
## 
##     impute
## The following object is masked from 'package:terra':
## 
##     interpolate
## The following object is masked from 'package:raster':
## 
##     interpolate
library(nnet)
library(xgboost)
## 
## Attaching package: 'xgboost'
## The following object is masked from 'package:tidygraph':
## 
##     slice
## The following object is masked from 'package:dplyr':
## 
##     slice
data("titanic_train")

Önce missing değerlere bakalım.

colSums(is.na(titanic_train))
## PassengerId    Survived      Pclass        Name         Sex         Age 
##           0           0           0           0           0         177 
##       SibSp       Parch      Ticket        Fare       Cabin    Embarked 
##           0           0           0           0           0           0
median(titanic_train$Age, na.rm = TRUE)
## [1] 28

Age columında 177 missing değer var. 891 veri içinde çok yüksek. Onun için impute dediğimiz yöntemi kullanalım ve median ile dolduralım. Böylece median değişmeden veriyi elde etmiş olduk.

titanic_train$Age[is.na(titanic_train$Age)] <- median(titanic_train$Age, na.rm = TRUE)
colSums(is.na(titanic_train))
## PassengerId    Survived      Pclass        Name         Sex         Age 
##           0           0           0           0           0           0 
##       SibSp       Parch      Ticket        Fare       Cabin    Embarked 
##           0           0           0           0           0           0
median(titanic_train$Age, na.rm = TRUE)
## [1] 28

Cinsiyet değişkenini dummy olarak kodlayalım.

titanic_train$Sex <- ifelse(titanic_train$Sex == "male", 1, 0)

Verinin basit olması için diğer columnları drop edelim.

titanic_train <- titanic_train %>% select(Survived, Pclass, Sex, Age)

Şu an verimiz analize hazır durumda. Şimdi machine learning kısmına başlayabiliriz. Öncelikle verimizi ikiye bölebiliriz. Titanic verisi çoktan ikiye bölünmüş olarak geldiği için bu aşamayı pas geçeceğiz ama verimiz tek halde gelseydi bu bölmeyi yapmak gerekecekti. Yine de nasıl yapılacağını görelim.

set.seed(123)
train_index <- createDataPartition(titanic_train$Survived, p = 0.7, list = FALSE)
train_data <- titanic_train[train_index, ]
test_data <- titanic_train[-train_index, ]

Modelimizi çalıştıralım. Gördüğümüz üzere model aslında lojistik regresyon. Sonuçlara bakalım.

model_log <- train(Survived ~ ., data = train_data, method = "glm", family = "binomial")
## Warning in train.default(x, y, weights = w, ...): You are trying to do
## regression and your outcome only has two possible values Are you trying to do
## classification? If so, use a 2 level factor as your outcome column.
summary(model_log)
## 
## Call:
## NULL
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.989959   0.547376   9.116  < 2e-16 ***
## Pclass      -1.168618   0.142408  -8.206 2.28e-16 ***
## Sex         -2.657501   0.225041 -11.809  < 2e-16 ***
## Age         -0.040044   0.009014  -4.442 8.90e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 833.37  on 623  degrees of freedom
## Residual deviance: 564.30  on 620  degrees of freedom
## AIC: 572.3
## 
## Number of Fisher Scoring iterations: 5

Tahminlerimizi inceleyelim.

pred_log_prob <- predict(model_log, test_data)
pred_log <- ifelse(pred_log_prob > 0.5, 1, 0)
confusionMatrix(as.factor(pred_log), as.factor(test_data$Survived))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 133  27
##          1  34  73
##                                           
##                Accuracy : 0.7715          
##                  95% CI : (0.7164, 0.8205)
##     No Information Rate : 0.6255          
##     P-Value [Acc > NIR] : 2.334e-07       
##                                           
##                   Kappa : 0.5191          
##                                           
##  Mcnemar's Test P-Value : 0.4424          
##                                           
##             Sensitivity : 0.7964          
##             Specificity : 0.7300          
##          Pos Pred Value : 0.8313          
##          Neg Pred Value : 0.6822          
##              Prevalence : 0.6255          
##          Detection Rate : 0.4981          
##    Detection Prevalence : 0.5993          
##       Balanced Accuracy : 0.7632          
##                                           
##        'Positive' Class : 0               
## 

Accuracy için alternatif bir yöntemle bakalım

accuracy_log <- mean(pred_log == test_data$Survived)
print(accuracy_log)
## [1] 0.7715356

Özellikle verinin balansının tam olmadığı durumlarda precision, recall ve f1 gibi değerleri kullanabiliriz. - Precision predicted pozitiflerin ne kadar doğru olduğunu görmemize yardımcı olur. - Recall veya sensitivity actual pozitiflerin ne kadar doğru tahmin edildiğini gösterir. - F-1 score ise bu ikisi arasında bir denge kurar.

precision_log <- posPredValue(as.factor(pred_log), as.factor(test_data$Survived), positive = "1")
recall_log <- sensitivity(as.factor(pred_log), as.factor(test_data$Survived), positive = "1")
f1_log <- 2 * ((precision_log * recall_log) / (precision_log + recall_log))
cat("Precision: ", precision_log, "\n")
## Precision:  0.682243
cat("Recall: ", recall_log, "\n")
## Recall:  0.73
cat("F1-Score: ", f1_log, "\n")
## F1-Score:  0.705314

Son olarak ROC Curve ve AUC değerlerine bakarak da gücünü görebiliriz. Curve sol üst köşeye ne kadar yakınsa o kadar iyi bir modele sahibiz diyebiliriz.

library(pROC)
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var
pred_log_prob <- predict(model_log, test_data, type = "raw")
roc_curve <- roc(test_data$Survived, pred_log_prob)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc_curve, col = "blue", main = "ROC Curve for Logistic Regression")

auc_log <- auc(roc_curve)
cat("AUC: ", auc_log, "\n")
## AUC:  0.8449401

AUC yorumlamasını şu şekilde yaparız.

  • AUC = 0.5 ise bu rastgele seçim yapmaktan farksızdır
  • AUC > 0.7 ise iyi bir model olduğunu gösterir
  • AUC > 0.9 ise mükemmele yakın bir model olduğunu gösterir.

Bu veri Bostondaki evlerin değerini tahmin etmek için kullanılacak.

library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:tidygraph':
## 
##     select
## The following object is masked from 'package:terra':
## 
##     area
## The following objects are masked from 'package:raster':
## 
##     area, select
## The following object is masked from 'package:dplyr':
## 
##     select
data("Boston")
Boston
##         crim    zn indus chas    nox    rm   age     dis rad tax ptratio  black
## 1    0.00632  18.0  2.31    0 0.5380 6.575  65.2  4.0900   1 296    15.3 396.90
## 2    0.02731   0.0  7.07    0 0.4690 6.421  78.9  4.9671   2 242    17.8 396.90
## 3    0.02729   0.0  7.07    0 0.4690 7.185  61.1  4.9671   2 242    17.8 392.83
## 4    0.03237   0.0  2.18    0 0.4580 6.998  45.8  6.0622   3 222    18.7 394.63
## 5    0.06905   0.0  2.18    0 0.4580 7.147  54.2  6.0622   3 222    18.7 396.90
## 6    0.02985   0.0  2.18    0 0.4580 6.430  58.7  6.0622   3 222    18.7 394.12
## 7    0.08829  12.5  7.87    0 0.5240 6.012  66.6  5.5605   5 311    15.2 395.60
## 8    0.14455  12.5  7.87    0 0.5240 6.172  96.1  5.9505   5 311    15.2 396.90
## 9    0.21124  12.5  7.87    0 0.5240 5.631 100.0  6.0821   5 311    15.2 386.63
## 10   0.17004  12.5  7.87    0 0.5240 6.004  85.9  6.5921   5 311    15.2 386.71
## 11   0.22489  12.5  7.87    0 0.5240 6.377  94.3  6.3467   5 311    15.2 392.52
## 12   0.11747  12.5  7.87    0 0.5240 6.009  82.9  6.2267   5 311    15.2 396.90
## 13   0.09378  12.5  7.87    0 0.5240 5.889  39.0  5.4509   5 311    15.2 390.50
## 14   0.62976   0.0  8.14    0 0.5380 5.949  61.8  4.7075   4 307    21.0 396.90
## 15   0.63796   0.0  8.14    0 0.5380 6.096  84.5  4.4619   4 307    21.0 380.02
## 16   0.62739   0.0  8.14    0 0.5380 5.834  56.5  4.4986   4 307    21.0 395.62
## 17   1.05393   0.0  8.14    0 0.5380 5.935  29.3  4.4986   4 307    21.0 386.85
## 18   0.78420   0.0  8.14    0 0.5380 5.990  81.7  4.2579   4 307    21.0 386.75
## 19   0.80271   0.0  8.14    0 0.5380 5.456  36.6  3.7965   4 307    21.0 288.99
## 20   0.72580   0.0  8.14    0 0.5380 5.727  69.5  3.7965   4 307    21.0 390.95
## 21   1.25179   0.0  8.14    0 0.5380 5.570  98.1  3.7979   4 307    21.0 376.57
## 22   0.85204   0.0  8.14    0 0.5380 5.965  89.2  4.0123   4 307    21.0 392.53
## 23   1.23247   0.0  8.14    0 0.5380 6.142  91.7  3.9769   4 307    21.0 396.90
## 24   0.98843   0.0  8.14    0 0.5380 5.813 100.0  4.0952   4 307    21.0 394.54
## 25   0.75026   0.0  8.14    0 0.5380 5.924  94.1  4.3996   4 307    21.0 394.33
## 26   0.84054   0.0  8.14    0 0.5380 5.599  85.7  4.4546   4 307    21.0 303.42
## 27   0.67191   0.0  8.14    0 0.5380 5.813  90.3  4.6820   4 307    21.0 376.88
## 28   0.95577   0.0  8.14    0 0.5380 6.047  88.8  4.4534   4 307    21.0 306.38
## 29   0.77299   0.0  8.14    0 0.5380 6.495  94.4  4.4547   4 307    21.0 387.94
## 30   1.00245   0.0  8.14    0 0.5380 6.674  87.3  4.2390   4 307    21.0 380.23
## 31   1.13081   0.0  8.14    0 0.5380 5.713  94.1  4.2330   4 307    21.0 360.17
## 32   1.35472   0.0  8.14    0 0.5380 6.072 100.0  4.1750   4 307    21.0 376.73
## 33   1.38799   0.0  8.14    0 0.5380 5.950  82.0  3.9900   4 307    21.0 232.60
## 34   1.15172   0.0  8.14    0 0.5380 5.701  95.0  3.7872   4 307    21.0 358.77
## 35   1.61282   0.0  8.14    0 0.5380 6.096  96.9  3.7598   4 307    21.0 248.31
## 36   0.06417   0.0  5.96    0 0.4990 5.933  68.2  3.3603   5 279    19.2 396.90
## 37   0.09744   0.0  5.96    0 0.4990 5.841  61.4  3.3779   5 279    19.2 377.56
## 38   0.08014   0.0  5.96    0 0.4990 5.850  41.5  3.9342   5 279    19.2 396.90
## 39   0.17505   0.0  5.96    0 0.4990 5.966  30.2  3.8473   5 279    19.2 393.43
## 40   0.02763  75.0  2.95    0 0.4280 6.595  21.8  5.4011   3 252    18.3 395.63
## 41   0.03359  75.0  2.95    0 0.4280 7.024  15.8  5.4011   3 252    18.3 395.62
## 42   0.12744   0.0  6.91    0 0.4480 6.770   2.9  5.7209   3 233    17.9 385.41
## 43   0.14150   0.0  6.91    0 0.4480 6.169   6.6  5.7209   3 233    17.9 383.37
## 44   0.15936   0.0  6.91    0 0.4480 6.211   6.5  5.7209   3 233    17.9 394.46
## 45   0.12269   0.0  6.91    0 0.4480 6.069  40.0  5.7209   3 233    17.9 389.39
## 46   0.17142   0.0  6.91    0 0.4480 5.682  33.8  5.1004   3 233    17.9 396.90
## 47   0.18836   0.0  6.91    0 0.4480 5.786  33.3  5.1004   3 233    17.9 396.90
## 48   0.22927   0.0  6.91    0 0.4480 6.030  85.5  5.6894   3 233    17.9 392.74
## 49   0.25387   0.0  6.91    0 0.4480 5.399  95.3  5.8700   3 233    17.9 396.90
## 50   0.21977   0.0  6.91    0 0.4480 5.602  62.0  6.0877   3 233    17.9 396.90
## 51   0.08873  21.0  5.64    0 0.4390 5.963  45.7  6.8147   4 243    16.8 395.56
## 52   0.04337  21.0  5.64    0 0.4390 6.115  63.0  6.8147   4 243    16.8 393.97
## 53   0.05360  21.0  5.64    0 0.4390 6.511  21.1  6.8147   4 243    16.8 396.90
## 54   0.04981  21.0  5.64    0 0.4390 5.998  21.4  6.8147   4 243    16.8 396.90
## 55   0.01360  75.0  4.00    0 0.4100 5.888  47.6  7.3197   3 469    21.1 396.90
## 56   0.01311  90.0  1.22    0 0.4030 7.249  21.9  8.6966   5 226    17.9 395.93
## 57   0.02055  85.0  0.74    0 0.4100 6.383  35.7  9.1876   2 313    17.3 396.90
## 58   0.01432 100.0  1.32    0 0.4110 6.816  40.5  8.3248   5 256    15.1 392.90
## 59   0.15445  25.0  5.13    0 0.4530 6.145  29.2  7.8148   8 284    19.7 390.68
## 60   0.10328  25.0  5.13    0 0.4530 5.927  47.2  6.9320   8 284    19.7 396.90
## 61   0.14932  25.0  5.13    0 0.4530 5.741  66.2  7.2254   8 284    19.7 395.11
## 62   0.17171  25.0  5.13    0 0.4530 5.966  93.4  6.8185   8 284    19.7 378.08
## 63   0.11027  25.0  5.13    0 0.4530 6.456  67.8  7.2255   8 284    19.7 396.90
## 64   0.12650  25.0  5.13    0 0.4530 6.762  43.4  7.9809   8 284    19.7 395.58
## 65   0.01951  17.5  1.38    0 0.4161 7.104  59.5  9.2229   3 216    18.6 393.24
## 66   0.03584  80.0  3.37    0 0.3980 6.290  17.8  6.6115   4 337    16.1 396.90
## 67   0.04379  80.0  3.37    0 0.3980 5.787  31.1  6.6115   4 337    16.1 396.90
## 68   0.05789  12.5  6.07    0 0.4090 5.878  21.4  6.4980   4 345    18.9 396.21
## 69   0.13554  12.5  6.07    0 0.4090 5.594  36.8  6.4980   4 345    18.9 396.90
## 70   0.12816  12.5  6.07    0 0.4090 5.885  33.0  6.4980   4 345    18.9 396.90
## 71   0.08826   0.0 10.81    0 0.4130 6.417   6.6  5.2873   4 305    19.2 383.73
## 72   0.15876   0.0 10.81    0 0.4130 5.961  17.5  5.2873   4 305    19.2 376.94
## 73   0.09164   0.0 10.81    0 0.4130 6.065   7.8  5.2873   4 305    19.2 390.91
## 74   0.19539   0.0 10.81    0 0.4130 6.245   6.2  5.2873   4 305    19.2 377.17
## 75   0.07896   0.0 12.83    0 0.4370 6.273   6.0  4.2515   5 398    18.7 394.92
## 76   0.09512   0.0 12.83    0 0.4370 6.286  45.0  4.5026   5 398    18.7 383.23
## 77   0.10153   0.0 12.83    0 0.4370 6.279  74.5  4.0522   5 398    18.7 373.66
## 78   0.08707   0.0 12.83    0 0.4370 6.140  45.8  4.0905   5 398    18.7 386.96
## 79   0.05646   0.0 12.83    0 0.4370 6.232  53.7  5.0141   5 398    18.7 386.40
## 80   0.08387   0.0 12.83    0 0.4370 5.874  36.6  4.5026   5 398    18.7 396.06
## 81   0.04113  25.0  4.86    0 0.4260 6.727  33.5  5.4007   4 281    19.0 396.90
## 82   0.04462  25.0  4.86    0 0.4260 6.619  70.4  5.4007   4 281    19.0 395.63
## 83   0.03659  25.0  4.86    0 0.4260 6.302  32.2  5.4007   4 281    19.0 396.90
## 84   0.03551  25.0  4.86    0 0.4260 6.167  46.7  5.4007   4 281    19.0 390.64
## 85   0.05059   0.0  4.49    0 0.4490 6.389  48.0  4.7794   3 247    18.5 396.90
## 86   0.05735   0.0  4.49    0 0.4490 6.630  56.1  4.4377   3 247    18.5 392.30
## 87   0.05188   0.0  4.49    0 0.4490 6.015  45.1  4.4272   3 247    18.5 395.99
## 88   0.07151   0.0  4.49    0 0.4490 6.121  56.8  3.7476   3 247    18.5 395.15
## 89   0.05660   0.0  3.41    0 0.4890 7.007  86.3  3.4217   2 270    17.8 396.90
## 90   0.05302   0.0  3.41    0 0.4890 7.079  63.1  3.4145   2 270    17.8 396.06
## 91   0.04684   0.0  3.41    0 0.4890 6.417  66.1  3.0923   2 270    17.8 392.18
## 92   0.03932   0.0  3.41    0 0.4890 6.405  73.9  3.0921   2 270    17.8 393.55
## 93   0.04203  28.0 15.04    0 0.4640 6.442  53.6  3.6659   4 270    18.2 395.01
## 94   0.02875  28.0 15.04    0 0.4640 6.211  28.9  3.6659   4 270    18.2 396.33
## 95   0.04294  28.0 15.04    0 0.4640 6.249  77.3  3.6150   4 270    18.2 396.90
## 96   0.12204   0.0  2.89    0 0.4450 6.625  57.8  3.4952   2 276    18.0 357.98
## 97   0.11504   0.0  2.89    0 0.4450 6.163  69.6  3.4952   2 276    18.0 391.83
## 98   0.12083   0.0  2.89    0 0.4450 8.069  76.0  3.4952   2 276    18.0 396.90
## 99   0.08187   0.0  2.89    0 0.4450 7.820  36.9  3.4952   2 276    18.0 393.53
## 100  0.06860   0.0  2.89    0 0.4450 7.416  62.5  3.4952   2 276    18.0 396.90
## 101  0.14866   0.0  8.56    0 0.5200 6.727  79.9  2.7778   5 384    20.9 394.76
## 102  0.11432   0.0  8.56    0 0.5200 6.781  71.3  2.8561   5 384    20.9 395.58
## 103  0.22876   0.0  8.56    0 0.5200 6.405  85.4  2.7147   5 384    20.9  70.80
## 104  0.21161   0.0  8.56    0 0.5200 6.137  87.4  2.7147   5 384    20.9 394.47
## 105  0.13960   0.0  8.56    0 0.5200 6.167  90.0  2.4210   5 384    20.9 392.69
## 106  0.13262   0.0  8.56    0 0.5200 5.851  96.7  2.1069   5 384    20.9 394.05
## 107  0.17120   0.0  8.56    0 0.5200 5.836  91.9  2.2110   5 384    20.9 395.67
## 108  0.13117   0.0  8.56    0 0.5200 6.127  85.2  2.1224   5 384    20.9 387.69
## 109  0.12802   0.0  8.56    0 0.5200 6.474  97.1  2.4329   5 384    20.9 395.24
## 110  0.26363   0.0  8.56    0 0.5200 6.229  91.2  2.5451   5 384    20.9 391.23
## 111  0.10793   0.0  8.56    0 0.5200 6.195  54.4  2.7778   5 384    20.9 393.49
## 112  0.10084   0.0 10.01    0 0.5470 6.715  81.6  2.6775   6 432    17.8 395.59
## 113  0.12329   0.0 10.01    0 0.5470 5.913  92.9  2.3534   6 432    17.8 394.95
## 114  0.22212   0.0 10.01    0 0.5470 6.092  95.4  2.5480   6 432    17.8 396.90
## 115  0.14231   0.0 10.01    0 0.5470 6.254  84.2  2.2565   6 432    17.8 388.74
## 116  0.17134   0.0 10.01    0 0.5470 5.928  88.2  2.4631   6 432    17.8 344.91
## 117  0.13158   0.0 10.01    0 0.5470 6.176  72.5  2.7301   6 432    17.8 393.30
## 118  0.15098   0.0 10.01    0 0.5470 6.021  82.6  2.7474   6 432    17.8 394.51
## 119  0.13058   0.0 10.01    0 0.5470 5.872  73.1  2.4775   6 432    17.8 338.63
## 120  0.14476   0.0 10.01    0 0.5470 5.731  65.2  2.7592   6 432    17.8 391.50
## 121  0.06899   0.0 25.65    0 0.5810 5.870  69.7  2.2577   2 188    19.1 389.15
## 122  0.07165   0.0 25.65    0 0.5810 6.004  84.1  2.1974   2 188    19.1 377.67
## 123  0.09299   0.0 25.65    0 0.5810 5.961  92.9  2.0869   2 188    19.1 378.09
## 124  0.15038   0.0 25.65    0 0.5810 5.856  97.0  1.9444   2 188    19.1 370.31
## 125  0.09849   0.0 25.65    0 0.5810 5.879  95.8  2.0063   2 188    19.1 379.38
## 126  0.16902   0.0 25.65    0 0.5810 5.986  88.4  1.9929   2 188    19.1 385.02
## 127  0.38735   0.0 25.65    0 0.5810 5.613  95.6  1.7572   2 188    19.1 359.29
## 128  0.25915   0.0 21.89    0 0.6240 5.693  96.0  1.7883   4 437    21.2 392.11
## 129  0.32543   0.0 21.89    0 0.6240 6.431  98.8  1.8125   4 437    21.2 396.90
## 130  0.88125   0.0 21.89    0 0.6240 5.637  94.7  1.9799   4 437    21.2 396.90
## 131  0.34006   0.0 21.89    0 0.6240 6.458  98.9  2.1185   4 437    21.2 395.04
## 132  1.19294   0.0 21.89    0 0.6240 6.326  97.7  2.2710   4 437    21.2 396.90
## 133  0.59005   0.0 21.89    0 0.6240 6.372  97.9  2.3274   4 437    21.2 385.76
## 134  0.32982   0.0 21.89    0 0.6240 5.822  95.4  2.4699   4 437    21.2 388.69
## 135  0.97617   0.0 21.89    0 0.6240 5.757  98.4  2.3460   4 437    21.2 262.76
## 136  0.55778   0.0 21.89    0 0.6240 6.335  98.2  2.1107   4 437    21.2 394.67
## 137  0.32264   0.0 21.89    0 0.6240 5.942  93.5  1.9669   4 437    21.2 378.25
## 138  0.35233   0.0 21.89    0 0.6240 6.454  98.4  1.8498   4 437    21.2 394.08
## 139  0.24980   0.0 21.89    0 0.6240 5.857  98.2  1.6686   4 437    21.2 392.04
## 140  0.54452   0.0 21.89    0 0.6240 6.151  97.9  1.6687   4 437    21.2 396.90
## 141  0.29090   0.0 21.89    0 0.6240 6.174  93.6  1.6119   4 437    21.2 388.08
## 142  1.62864   0.0 21.89    0 0.6240 5.019 100.0  1.4394   4 437    21.2 396.90
## 143  3.32105   0.0 19.58    1 0.8710 5.403 100.0  1.3216   5 403    14.7 396.90
## 144  4.09740   0.0 19.58    0 0.8710 5.468 100.0  1.4118   5 403    14.7 396.90
## 145  2.77974   0.0 19.58    0 0.8710 4.903  97.8  1.3459   5 403    14.7 396.90
## 146  2.37934   0.0 19.58    0 0.8710 6.130 100.0  1.4191   5 403    14.7 172.91
## 147  2.15505   0.0 19.58    0 0.8710 5.628 100.0  1.5166   5 403    14.7 169.27
## 148  2.36862   0.0 19.58    0 0.8710 4.926  95.7  1.4608   5 403    14.7 391.71
## 149  2.33099   0.0 19.58    0 0.8710 5.186  93.8  1.5296   5 403    14.7 356.99
## 150  2.73397   0.0 19.58    0 0.8710 5.597  94.9  1.5257   5 403    14.7 351.85
## 151  1.65660   0.0 19.58    0 0.8710 6.122  97.3  1.6180   5 403    14.7 372.80
## 152  1.49632   0.0 19.58    0 0.8710 5.404 100.0  1.5916   5 403    14.7 341.60
## 153  1.12658   0.0 19.58    1 0.8710 5.012  88.0  1.6102   5 403    14.7 343.28
## 154  2.14918   0.0 19.58    0 0.8710 5.709  98.5  1.6232   5 403    14.7 261.95
## 155  1.41385   0.0 19.58    1 0.8710 6.129  96.0  1.7494   5 403    14.7 321.02
## 156  3.53501   0.0 19.58    1 0.8710 6.152  82.6  1.7455   5 403    14.7  88.01
## 157  2.44668   0.0 19.58    0 0.8710 5.272  94.0  1.7364   5 403    14.7  88.63
## 158  1.22358   0.0 19.58    0 0.6050 6.943  97.4  1.8773   5 403    14.7 363.43
## 159  1.34284   0.0 19.58    0 0.6050 6.066 100.0  1.7573   5 403    14.7 353.89
## 160  1.42502   0.0 19.58    0 0.8710 6.510 100.0  1.7659   5 403    14.7 364.31
## 161  1.27346   0.0 19.58    1 0.6050 6.250  92.6  1.7984   5 403    14.7 338.92
## 162  1.46336   0.0 19.58    0 0.6050 7.489  90.8  1.9709   5 403    14.7 374.43
## 163  1.83377   0.0 19.58    1 0.6050 7.802  98.2  2.0407   5 403    14.7 389.61
## 164  1.51902   0.0 19.58    1 0.6050 8.375  93.9  2.1620   5 403    14.7 388.45
## 165  2.24236   0.0 19.58    0 0.6050 5.854  91.8  2.4220   5 403    14.7 395.11
## 166  2.92400   0.0 19.58    0 0.6050 6.101  93.0  2.2834   5 403    14.7 240.16
## 167  2.01019   0.0 19.58    0 0.6050 7.929  96.2  2.0459   5 403    14.7 369.30
## 168  1.80028   0.0 19.58    0 0.6050 5.877  79.2  2.4259   5 403    14.7 227.61
## 169  2.30040   0.0 19.58    0 0.6050 6.319  96.1  2.1000   5 403    14.7 297.09
## 170  2.44953   0.0 19.58    0 0.6050 6.402  95.2  2.2625   5 403    14.7 330.04
## 171  1.20742   0.0 19.58    0 0.6050 5.875  94.6  2.4259   5 403    14.7 292.29
## 172  2.31390   0.0 19.58    0 0.6050 5.880  97.3  2.3887   5 403    14.7 348.13
## 173  0.13914   0.0  4.05    0 0.5100 5.572  88.5  2.5961   5 296    16.6 396.90
## 174  0.09178   0.0  4.05    0 0.5100 6.416  84.1  2.6463   5 296    16.6 395.50
## 175  0.08447   0.0  4.05    0 0.5100 5.859  68.7  2.7019   5 296    16.6 393.23
## 176  0.06664   0.0  4.05    0 0.5100 6.546  33.1  3.1323   5 296    16.6 390.96
## 177  0.07022   0.0  4.05    0 0.5100 6.020  47.2  3.5549   5 296    16.6 393.23
## 178  0.05425   0.0  4.05    0 0.5100 6.315  73.4  3.3175   5 296    16.6 395.60
## 179  0.06642   0.0  4.05    0 0.5100 6.860  74.4  2.9153   5 296    16.6 391.27
## 180  0.05780   0.0  2.46    0 0.4880 6.980  58.4  2.8290   3 193    17.8 396.90
## 181  0.06588   0.0  2.46    0 0.4880 7.765  83.3  2.7410   3 193    17.8 395.56
## 182  0.06888   0.0  2.46    0 0.4880 6.144  62.2  2.5979   3 193    17.8 396.90
## 183  0.09103   0.0  2.46    0 0.4880 7.155  92.2  2.7006   3 193    17.8 394.12
## 184  0.10008   0.0  2.46    0 0.4880 6.563  95.6  2.8470   3 193    17.8 396.90
## 185  0.08308   0.0  2.46    0 0.4880 5.604  89.8  2.9879   3 193    17.8 391.00
## 186  0.06047   0.0  2.46    0 0.4880 6.153  68.8  3.2797   3 193    17.8 387.11
## 187  0.05602   0.0  2.46    0 0.4880 7.831  53.6  3.1992   3 193    17.8 392.63
## 188  0.07875  45.0  3.44    0 0.4370 6.782  41.1  3.7886   5 398    15.2 393.87
## 189  0.12579  45.0  3.44    0 0.4370 6.556  29.1  4.5667   5 398    15.2 382.84
## 190  0.08370  45.0  3.44    0 0.4370 7.185  38.9  4.5667   5 398    15.2 396.90
## 191  0.09068  45.0  3.44    0 0.4370 6.951  21.5  6.4798   5 398    15.2 377.68
## 192  0.06911  45.0  3.44    0 0.4370 6.739  30.8  6.4798   5 398    15.2 389.71
## 193  0.08664  45.0  3.44    0 0.4370 7.178  26.3  6.4798   5 398    15.2 390.49
## 194  0.02187  60.0  2.93    0 0.4010 6.800   9.9  6.2196   1 265    15.6 393.37
## 195  0.01439  60.0  2.93    0 0.4010 6.604  18.8  6.2196   1 265    15.6 376.70
## 196  0.01381  80.0  0.46    0 0.4220 7.875  32.0  5.6484   4 255    14.4 394.23
## 197  0.04011  80.0  1.52    0 0.4040 7.287  34.1  7.3090   2 329    12.6 396.90
## 198  0.04666  80.0  1.52    0 0.4040 7.107  36.6  7.3090   2 329    12.6 354.31
## 199  0.03768  80.0  1.52    0 0.4040 7.274  38.3  7.3090   2 329    12.6 392.20
## 200  0.03150  95.0  1.47    0 0.4030 6.975  15.3  7.6534   3 402    17.0 396.90
## 201  0.01778  95.0  1.47    0 0.4030 7.135  13.9  7.6534   3 402    17.0 384.30
## 202  0.03445  82.5  2.03    0 0.4150 6.162  38.4  6.2700   2 348    14.7 393.77
## 203  0.02177  82.5  2.03    0 0.4150 7.610  15.7  6.2700   2 348    14.7 395.38
## 204  0.03510  95.0  2.68    0 0.4161 7.853  33.2  5.1180   4 224    14.7 392.78
## 205  0.02009  95.0  2.68    0 0.4161 8.034  31.9  5.1180   4 224    14.7 390.55
## 206  0.13642   0.0 10.59    0 0.4890 5.891  22.3  3.9454   4 277    18.6 396.90
## 207  0.22969   0.0 10.59    0 0.4890 6.326  52.5  4.3549   4 277    18.6 394.87
## 208  0.25199   0.0 10.59    0 0.4890 5.783  72.7  4.3549   4 277    18.6 389.43
## 209  0.13587   0.0 10.59    1 0.4890 6.064  59.1  4.2392   4 277    18.6 381.32
## 210  0.43571   0.0 10.59    1 0.4890 5.344 100.0  3.8750   4 277    18.6 396.90
## 211  0.17446   0.0 10.59    1 0.4890 5.960  92.1  3.8771   4 277    18.6 393.25
## 212  0.37578   0.0 10.59    1 0.4890 5.404  88.6  3.6650   4 277    18.6 395.24
## 213  0.21719   0.0 10.59    1 0.4890 5.807  53.8  3.6526   4 277    18.6 390.94
## 214  0.14052   0.0 10.59    0 0.4890 6.375  32.3  3.9454   4 277    18.6 385.81
## 215  0.28955   0.0 10.59    0 0.4890 5.412   9.8  3.5875   4 277    18.6 348.93
## 216  0.19802   0.0 10.59    0 0.4890 6.182  42.4  3.9454   4 277    18.6 393.63
## 217  0.04560   0.0 13.89    1 0.5500 5.888  56.0  3.1121   5 276    16.4 392.80
## 218  0.07013   0.0 13.89    0 0.5500 6.642  85.1  3.4211   5 276    16.4 392.78
## 219  0.11069   0.0 13.89    1 0.5500 5.951  93.8  2.8893   5 276    16.4 396.90
## 220  0.11425   0.0 13.89    1 0.5500 6.373  92.4  3.3633   5 276    16.4 393.74
## 221  0.35809   0.0  6.20    1 0.5070 6.951  88.5  2.8617   8 307    17.4 391.70
## 222  0.40771   0.0  6.20    1 0.5070 6.164  91.3  3.0480   8 307    17.4 395.24
## 223  0.62356   0.0  6.20    1 0.5070 6.879  77.7  3.2721   8 307    17.4 390.39
## 224  0.61470   0.0  6.20    0 0.5070 6.618  80.8  3.2721   8 307    17.4 396.90
## 225  0.31533   0.0  6.20    0 0.5040 8.266  78.3  2.8944   8 307    17.4 385.05
## 226  0.52693   0.0  6.20    0 0.5040 8.725  83.0  2.8944   8 307    17.4 382.00
## 227  0.38214   0.0  6.20    0 0.5040 8.040  86.5  3.2157   8 307    17.4 387.38
## 228  0.41238   0.0  6.20    0 0.5040 7.163  79.9  3.2157   8 307    17.4 372.08
## 229  0.29819   0.0  6.20    0 0.5040 7.686  17.0  3.3751   8 307    17.4 377.51
## 230  0.44178   0.0  6.20    0 0.5040 6.552  21.4  3.3751   8 307    17.4 380.34
## 231  0.53700   0.0  6.20    0 0.5040 5.981  68.1  3.6715   8 307    17.4 378.35
## 232  0.46296   0.0  6.20    0 0.5040 7.412  76.9  3.6715   8 307    17.4 376.14
## 233  0.57529   0.0  6.20    0 0.5070 8.337  73.3  3.8384   8 307    17.4 385.91
## 234  0.33147   0.0  6.20    0 0.5070 8.247  70.4  3.6519   8 307    17.4 378.95
## 235  0.44791   0.0  6.20    1 0.5070 6.726  66.5  3.6519   8 307    17.4 360.20
## 236  0.33045   0.0  6.20    0 0.5070 6.086  61.5  3.6519   8 307    17.4 376.75
## 237  0.52058   0.0  6.20    1 0.5070 6.631  76.5  4.1480   8 307    17.4 388.45
## 238  0.51183   0.0  6.20    0 0.5070 7.358  71.6  4.1480   8 307    17.4 390.07
## 239  0.08244  30.0  4.93    0 0.4280 6.481  18.5  6.1899   6 300    16.6 379.41
## 240  0.09252  30.0  4.93    0 0.4280 6.606  42.2  6.1899   6 300    16.6 383.78
## 241  0.11329  30.0  4.93    0 0.4280 6.897  54.3  6.3361   6 300    16.6 391.25
## 242  0.10612  30.0  4.93    0 0.4280 6.095  65.1  6.3361   6 300    16.6 394.62
## 243  0.10290  30.0  4.93    0 0.4280 6.358  52.9  7.0355   6 300    16.6 372.75
## 244  0.12757  30.0  4.93    0 0.4280 6.393   7.8  7.0355   6 300    16.6 374.71
## 245  0.20608  22.0  5.86    0 0.4310 5.593  76.5  7.9549   7 330    19.1 372.49
## 246  0.19133  22.0  5.86    0 0.4310 5.605  70.2  7.9549   7 330    19.1 389.13
## 247  0.33983  22.0  5.86    0 0.4310 6.108  34.9  8.0555   7 330    19.1 390.18
## 248  0.19657  22.0  5.86    0 0.4310 6.226  79.2  8.0555   7 330    19.1 376.14
## 249  0.16439  22.0  5.86    0 0.4310 6.433  49.1  7.8265   7 330    19.1 374.71
## 250  0.19073  22.0  5.86    0 0.4310 6.718  17.5  7.8265   7 330    19.1 393.74
## 251  0.14030  22.0  5.86    0 0.4310 6.487  13.0  7.3967   7 330    19.1 396.28
## 252  0.21409  22.0  5.86    0 0.4310 6.438   8.9  7.3967   7 330    19.1 377.07
## 253  0.08221  22.0  5.86    0 0.4310 6.957   6.8  8.9067   7 330    19.1 386.09
## 254  0.36894  22.0  5.86    0 0.4310 8.259   8.4  8.9067   7 330    19.1 396.90
## 255  0.04819  80.0  3.64    0 0.3920 6.108  32.0  9.2203   1 315    16.4 392.89
## 256  0.03548  80.0  3.64    0 0.3920 5.876  19.1  9.2203   1 315    16.4 395.18
## 257  0.01538  90.0  3.75    0 0.3940 7.454  34.2  6.3361   3 244    15.9 386.34
## 258  0.61154  20.0  3.97    0 0.6470 8.704  86.9  1.8010   5 264    13.0 389.70
## 259  0.66351  20.0  3.97    0 0.6470 7.333 100.0  1.8946   5 264    13.0 383.29
## 260  0.65665  20.0  3.97    0 0.6470 6.842 100.0  2.0107   5 264    13.0 391.93
## 261  0.54011  20.0  3.97    0 0.6470 7.203  81.8  2.1121   5 264    13.0 392.80
## 262  0.53412  20.0  3.97    0 0.6470 7.520  89.4  2.1398   5 264    13.0 388.37
## 263  0.52014  20.0  3.97    0 0.6470 8.398  91.5  2.2885   5 264    13.0 386.86
## 264  0.82526  20.0  3.97    0 0.6470 7.327  94.5  2.0788   5 264    13.0 393.42
## 265  0.55007  20.0  3.97    0 0.6470 7.206  91.6  1.9301   5 264    13.0 387.89
## 266  0.76162  20.0  3.97    0 0.6470 5.560  62.8  1.9865   5 264    13.0 392.40
## 267  0.78570  20.0  3.97    0 0.6470 7.014  84.6  2.1329   5 264    13.0 384.07
## 268  0.57834  20.0  3.97    0 0.5750 8.297  67.0  2.4216   5 264    13.0 384.54
## 269  0.54050  20.0  3.97    0 0.5750 7.470  52.6  2.8720   5 264    13.0 390.30
## 270  0.09065  20.0  6.96    1 0.4640 5.920  61.5  3.9175   3 223    18.6 391.34
## 271  0.29916  20.0  6.96    0 0.4640 5.856  42.1  4.4290   3 223    18.6 388.65
## 272  0.16211  20.0  6.96    0 0.4640 6.240  16.3  4.4290   3 223    18.6 396.90
## 273  0.11460  20.0  6.96    0 0.4640 6.538  58.7  3.9175   3 223    18.6 394.96
## 274  0.22188  20.0  6.96    1 0.4640 7.691  51.8  4.3665   3 223    18.6 390.77
## 275  0.05644  40.0  6.41    1 0.4470 6.758  32.9  4.0776   4 254    17.6 396.90
## 276  0.09604  40.0  6.41    0 0.4470 6.854  42.8  4.2673   4 254    17.6 396.90
## 277  0.10469  40.0  6.41    1 0.4470 7.267  49.0  4.7872   4 254    17.6 389.25
## 278  0.06127  40.0  6.41    1 0.4470 6.826  27.6  4.8628   4 254    17.6 393.45
## 279  0.07978  40.0  6.41    0 0.4470 6.482  32.1  4.1403   4 254    17.6 396.90
## 280  0.21038  20.0  3.33    0 0.4429 6.812  32.2  4.1007   5 216    14.9 396.90
## 281  0.03578  20.0  3.33    0 0.4429 7.820  64.5  4.6947   5 216    14.9 387.31
## 282  0.03705  20.0  3.33    0 0.4429 6.968  37.2  5.2447   5 216    14.9 392.23
## 283  0.06129  20.0  3.33    1 0.4429 7.645  49.7  5.2119   5 216    14.9 377.07
## 284  0.01501  90.0  1.21    1 0.4010 7.923  24.8  5.8850   1 198    13.6 395.52
## 285  0.00906  90.0  2.97    0 0.4000 7.088  20.8  7.3073   1 285    15.3 394.72
## 286  0.01096  55.0  2.25    0 0.3890 6.453  31.9  7.3073   1 300    15.3 394.72
## 287  0.01965  80.0  1.76    0 0.3850 6.230  31.5  9.0892   1 241    18.2 341.60
## 288  0.03871  52.5  5.32    0 0.4050 6.209  31.3  7.3172   6 293    16.6 396.90
## 289  0.04590  52.5  5.32    0 0.4050 6.315  45.6  7.3172   6 293    16.6 396.90
## 290  0.04297  52.5  5.32    0 0.4050 6.565  22.9  7.3172   6 293    16.6 371.72
## 291  0.03502  80.0  4.95    0 0.4110 6.861  27.9  5.1167   4 245    19.2 396.90
## 292  0.07886  80.0  4.95    0 0.4110 7.148  27.7  5.1167   4 245    19.2 396.90
## 293  0.03615  80.0  4.95    0 0.4110 6.630  23.4  5.1167   4 245    19.2 396.90
## 294  0.08265   0.0 13.92    0 0.4370 6.127  18.4  5.5027   4 289    16.0 396.90
## 295  0.08199   0.0 13.92    0 0.4370 6.009  42.3  5.5027   4 289    16.0 396.90
## 296  0.12932   0.0 13.92    0 0.4370 6.678  31.1  5.9604   4 289    16.0 396.90
## 297  0.05372   0.0 13.92    0 0.4370 6.549  51.0  5.9604   4 289    16.0 392.85
## 298  0.14103   0.0 13.92    0 0.4370 5.790  58.0  6.3200   4 289    16.0 396.90
## 299  0.06466  70.0  2.24    0 0.4000 6.345  20.1  7.8278   5 358    14.8 368.24
## 300  0.05561  70.0  2.24    0 0.4000 7.041  10.0  7.8278   5 358    14.8 371.58
## 301  0.04417  70.0  2.24    0 0.4000 6.871  47.4  7.8278   5 358    14.8 390.86
## 302  0.03537  34.0  6.09    0 0.4330 6.590  40.4  5.4917   7 329    16.1 395.75
## 303  0.09266  34.0  6.09    0 0.4330 6.495  18.4  5.4917   7 329    16.1 383.61
## 304  0.10000  34.0  6.09    0 0.4330 6.982  17.7  5.4917   7 329    16.1 390.43
## 305  0.05515  33.0  2.18    0 0.4720 7.236  41.1  4.0220   7 222    18.4 393.68
## 306  0.05479  33.0  2.18    0 0.4720 6.616  58.1  3.3700   7 222    18.4 393.36
## 307  0.07503  33.0  2.18    0 0.4720 7.420  71.9  3.0992   7 222    18.4 396.90
## 308  0.04932  33.0  2.18    0 0.4720 6.849  70.3  3.1827   7 222    18.4 396.90
## 309  0.49298   0.0  9.90    0 0.5440 6.635  82.5  3.3175   4 304    18.4 396.90
## 310  0.34940   0.0  9.90    0 0.5440 5.972  76.7  3.1025   4 304    18.4 396.24
## 311  2.63548   0.0  9.90    0 0.5440 4.973  37.8  2.5194   4 304    18.4 350.45
## 312  0.79041   0.0  9.90    0 0.5440 6.122  52.8  2.6403   4 304    18.4 396.90
## 313  0.26169   0.0  9.90    0 0.5440 6.023  90.4  2.8340   4 304    18.4 396.30
## 314  0.26938   0.0  9.90    0 0.5440 6.266  82.8  3.2628   4 304    18.4 393.39
## 315  0.36920   0.0  9.90    0 0.5440 6.567  87.3  3.6023   4 304    18.4 395.69
## 316  0.25356   0.0  9.90    0 0.5440 5.705  77.7  3.9450   4 304    18.4 396.42
## 317  0.31827   0.0  9.90    0 0.5440 5.914  83.2  3.9986   4 304    18.4 390.70
## 318  0.24522   0.0  9.90    0 0.5440 5.782  71.7  4.0317   4 304    18.4 396.90
## 319  0.40202   0.0  9.90    0 0.5440 6.382  67.2  3.5325   4 304    18.4 395.21
## 320  0.47547   0.0  9.90    0 0.5440 6.113  58.8  4.0019   4 304    18.4 396.23
## 321  0.16760   0.0  7.38    0 0.4930 6.426  52.3  4.5404   5 287    19.6 396.90
## 322  0.18159   0.0  7.38    0 0.4930 6.376  54.3  4.5404   5 287    19.6 396.90
## 323  0.35114   0.0  7.38    0 0.4930 6.041  49.9  4.7211   5 287    19.6 396.90
## 324  0.28392   0.0  7.38    0 0.4930 5.708  74.3  4.7211   5 287    19.6 391.13
## 325  0.34109   0.0  7.38    0 0.4930 6.415  40.1  4.7211   5 287    19.6 396.90
## 326  0.19186   0.0  7.38    0 0.4930 6.431  14.7  5.4159   5 287    19.6 393.68
## 327  0.30347   0.0  7.38    0 0.4930 6.312  28.9  5.4159   5 287    19.6 396.90
## 328  0.24103   0.0  7.38    0 0.4930 6.083  43.7  5.4159   5 287    19.6 396.90
## 329  0.06617   0.0  3.24    0 0.4600 5.868  25.8  5.2146   4 430    16.9 382.44
## 330  0.06724   0.0  3.24    0 0.4600 6.333  17.2  5.2146   4 430    16.9 375.21
## 331  0.04544   0.0  3.24    0 0.4600 6.144  32.2  5.8736   4 430    16.9 368.57
## 332  0.05023  35.0  6.06    0 0.4379 5.706  28.4  6.6407   1 304    16.9 394.02
## 333  0.03466  35.0  6.06    0 0.4379 6.031  23.3  6.6407   1 304    16.9 362.25
## 334  0.05083   0.0  5.19    0 0.5150 6.316  38.1  6.4584   5 224    20.2 389.71
## 335  0.03738   0.0  5.19    0 0.5150 6.310  38.5  6.4584   5 224    20.2 389.40
## 336  0.03961   0.0  5.19    0 0.5150 6.037  34.5  5.9853   5 224    20.2 396.90
## 337  0.03427   0.0  5.19    0 0.5150 5.869  46.3  5.2311   5 224    20.2 396.90
## 338  0.03041   0.0  5.19    0 0.5150 5.895  59.6  5.6150   5 224    20.2 394.81
## 339  0.03306   0.0  5.19    0 0.5150 6.059  37.3  4.8122   5 224    20.2 396.14
## 340  0.05497   0.0  5.19    0 0.5150 5.985  45.4  4.8122   5 224    20.2 396.90
## 341  0.06151   0.0  5.19    0 0.5150 5.968  58.5  4.8122   5 224    20.2 396.90
## 342  0.01301  35.0  1.52    0 0.4420 7.241  49.3  7.0379   1 284    15.5 394.74
## 343  0.02498   0.0  1.89    0 0.5180 6.540  59.7  6.2669   1 422    15.9 389.96
## 344  0.02543  55.0  3.78    0 0.4840 6.696  56.4  5.7321   5 370    17.6 396.90
## 345  0.03049  55.0  3.78    0 0.4840 6.874  28.1  6.4654   5 370    17.6 387.97
## 346  0.03113   0.0  4.39    0 0.4420 6.014  48.5  8.0136   3 352    18.8 385.64
## 347  0.06162   0.0  4.39    0 0.4420 5.898  52.3  8.0136   3 352    18.8 364.61
## 348  0.01870  85.0  4.15    0 0.4290 6.516  27.7  8.5353   4 351    17.9 392.43
## 349  0.01501  80.0  2.01    0 0.4350 6.635  29.7  8.3440   4 280    17.0 390.94
## 350  0.02899  40.0  1.25    0 0.4290 6.939  34.5  8.7921   1 335    19.7 389.85
## 351  0.06211  40.0  1.25    0 0.4290 6.490  44.4  8.7921   1 335    19.7 396.90
## 352  0.07950  60.0  1.69    0 0.4110 6.579  35.9 10.7103   4 411    18.3 370.78
## 353  0.07244  60.0  1.69    0 0.4110 5.884  18.5 10.7103   4 411    18.3 392.33
## 354  0.01709  90.0  2.02    0 0.4100 6.728  36.1 12.1265   5 187    17.0 384.46
## 355  0.04301  80.0  1.91    0 0.4130 5.663  21.9 10.5857   4 334    22.0 382.80
## 356  0.10659  80.0  1.91    0 0.4130 5.936  19.5 10.5857   4 334    22.0 376.04
## 357  8.98296   0.0 18.10    1 0.7700 6.212  97.4  2.1222  24 666    20.2 377.73
## 358  3.84970   0.0 18.10    1 0.7700 6.395  91.0  2.5052  24 666    20.2 391.34
## 359  5.20177   0.0 18.10    1 0.7700 6.127  83.4  2.7227  24 666    20.2 395.43
## 360  4.26131   0.0 18.10    0 0.7700 6.112  81.3  2.5091  24 666    20.2 390.74
## 361  4.54192   0.0 18.10    0 0.7700 6.398  88.0  2.5182  24 666    20.2 374.56
## 362  3.83684   0.0 18.10    0 0.7700 6.251  91.1  2.2955  24 666    20.2 350.65
## 363  3.67822   0.0 18.10    0 0.7700 5.362  96.2  2.1036  24 666    20.2 380.79
## 364  4.22239   0.0 18.10    1 0.7700 5.803  89.0  1.9047  24 666    20.2 353.04
## 365  3.47428   0.0 18.10    1 0.7180 8.780  82.9  1.9047  24 666    20.2 354.55
## 366  4.55587   0.0 18.10    0 0.7180 3.561  87.9  1.6132  24 666    20.2 354.70
## 367  3.69695   0.0 18.10    0 0.7180 4.963  91.4  1.7523  24 666    20.2 316.03
## 368 13.52220   0.0 18.10    0 0.6310 3.863 100.0  1.5106  24 666    20.2 131.42
## 369  4.89822   0.0 18.10    0 0.6310 4.970 100.0  1.3325  24 666    20.2 375.52
## 370  5.66998   0.0 18.10    1 0.6310 6.683  96.8  1.3567  24 666    20.2 375.33
## 371  6.53876   0.0 18.10    1 0.6310 7.016  97.5  1.2024  24 666    20.2 392.05
## 372  9.23230   0.0 18.10    0 0.6310 6.216 100.0  1.1691  24 666    20.2 366.15
## 373  8.26725   0.0 18.10    1 0.6680 5.875  89.6  1.1296  24 666    20.2 347.88
## 374 11.10810   0.0 18.10    0 0.6680 4.906 100.0  1.1742  24 666    20.2 396.90
## 375 18.49820   0.0 18.10    0 0.6680 4.138 100.0  1.1370  24 666    20.2 396.90
## 376 19.60910   0.0 18.10    0 0.6710 7.313  97.9  1.3163  24 666    20.2 396.90
## 377 15.28800   0.0 18.10    0 0.6710 6.649  93.3  1.3449  24 666    20.2 363.02
## 378  9.82349   0.0 18.10    0 0.6710 6.794  98.8  1.3580  24 666    20.2 396.90
## 379 23.64820   0.0 18.10    0 0.6710 6.380  96.2  1.3861  24 666    20.2 396.90
## 380 17.86670   0.0 18.10    0 0.6710 6.223 100.0  1.3861  24 666    20.2 393.74
## 381 88.97620   0.0 18.10    0 0.6710 6.968  91.9  1.4165  24 666    20.2 396.90
## 382 15.87440   0.0 18.10    0 0.6710 6.545  99.1  1.5192  24 666    20.2 396.90
## 383  9.18702   0.0 18.10    0 0.7000 5.536 100.0  1.5804  24 666    20.2 396.90
## 384  7.99248   0.0 18.10    0 0.7000 5.520 100.0  1.5331  24 666    20.2 396.90
## 385 20.08490   0.0 18.10    0 0.7000 4.368  91.2  1.4395  24 666    20.2 285.83
## 386 16.81180   0.0 18.10    0 0.7000 5.277  98.1  1.4261  24 666    20.2 396.90
## 387 24.39380   0.0 18.10    0 0.7000 4.652 100.0  1.4672  24 666    20.2 396.90
## 388 22.59710   0.0 18.10    0 0.7000 5.000  89.5  1.5184  24 666    20.2 396.90
## 389 14.33370   0.0 18.10    0 0.7000 4.880 100.0  1.5895  24 666    20.2 372.92
## 390  8.15174   0.0 18.10    0 0.7000 5.390  98.9  1.7281  24 666    20.2 396.90
## 391  6.96215   0.0 18.10    0 0.7000 5.713  97.0  1.9265  24 666    20.2 394.43
## 392  5.29305   0.0 18.10    0 0.7000 6.051  82.5  2.1678  24 666    20.2 378.38
## 393 11.57790   0.0 18.10    0 0.7000 5.036  97.0  1.7700  24 666    20.2 396.90
## 394  8.64476   0.0 18.10    0 0.6930 6.193  92.6  1.7912  24 666    20.2 396.90
## 395 13.35980   0.0 18.10    0 0.6930 5.887  94.7  1.7821  24 666    20.2 396.90
## 396  8.71675   0.0 18.10    0 0.6930 6.471  98.8  1.7257  24 666    20.2 391.98
## 397  5.87205   0.0 18.10    0 0.6930 6.405  96.0  1.6768  24 666    20.2 396.90
## 398  7.67202   0.0 18.10    0 0.6930 5.747  98.9  1.6334  24 666    20.2 393.10
## 399 38.35180   0.0 18.10    0 0.6930 5.453 100.0  1.4896  24 666    20.2 396.90
## 400  9.91655   0.0 18.10    0 0.6930 5.852  77.8  1.5004  24 666    20.2 338.16
## 401 25.04610   0.0 18.10    0 0.6930 5.987 100.0  1.5888  24 666    20.2 396.90
## 402 14.23620   0.0 18.10    0 0.6930 6.343 100.0  1.5741  24 666    20.2 396.90
## 403  9.59571   0.0 18.10    0 0.6930 6.404 100.0  1.6390  24 666    20.2 376.11
## 404 24.80170   0.0 18.10    0 0.6930 5.349  96.0  1.7028  24 666    20.2 396.90
## 405 41.52920   0.0 18.10    0 0.6930 5.531  85.4  1.6074  24 666    20.2 329.46
## 406 67.92080   0.0 18.10    0 0.6930 5.683 100.0  1.4254  24 666    20.2 384.97
## 407 20.71620   0.0 18.10    0 0.6590 4.138 100.0  1.1781  24 666    20.2 370.22
## 408 11.95110   0.0 18.10    0 0.6590 5.608 100.0  1.2852  24 666    20.2 332.09
## 409  7.40389   0.0 18.10    0 0.5970 5.617  97.9  1.4547  24 666    20.2 314.64
## 410 14.43830   0.0 18.10    0 0.5970 6.852 100.0  1.4655  24 666    20.2 179.36
## 411 51.13580   0.0 18.10    0 0.5970 5.757 100.0  1.4130  24 666    20.2   2.60
## 412 14.05070   0.0 18.10    0 0.5970 6.657 100.0  1.5275  24 666    20.2  35.05
## 413 18.81100   0.0 18.10    0 0.5970 4.628 100.0  1.5539  24 666    20.2  28.79
## 414 28.65580   0.0 18.10    0 0.5970 5.155 100.0  1.5894  24 666    20.2 210.97
## 415 45.74610   0.0 18.10    0 0.6930 4.519 100.0  1.6582  24 666    20.2  88.27
## 416 18.08460   0.0 18.10    0 0.6790 6.434 100.0  1.8347  24 666    20.2  27.25
## 417 10.83420   0.0 18.10    0 0.6790 6.782  90.8  1.8195  24 666    20.2  21.57
## 418 25.94060   0.0 18.10    0 0.6790 5.304  89.1  1.6475  24 666    20.2 127.36
## 419 73.53410   0.0 18.10    0 0.6790 5.957 100.0  1.8026  24 666    20.2  16.45
## 420 11.81230   0.0 18.10    0 0.7180 6.824  76.5  1.7940  24 666    20.2  48.45
## 421 11.08740   0.0 18.10    0 0.7180 6.411 100.0  1.8589  24 666    20.2 318.75
## 422  7.02259   0.0 18.10    0 0.7180 6.006  95.3  1.8746  24 666    20.2 319.98
## 423 12.04820   0.0 18.10    0 0.6140 5.648  87.6  1.9512  24 666    20.2 291.55
## 424  7.05042   0.0 18.10    0 0.6140 6.103  85.1  2.0218  24 666    20.2   2.52
## 425  8.79212   0.0 18.10    0 0.5840 5.565  70.6  2.0635  24 666    20.2   3.65
## 426 15.86030   0.0 18.10    0 0.6790 5.896  95.4  1.9096  24 666    20.2   7.68
## 427 12.24720   0.0 18.10    0 0.5840 5.837  59.7  1.9976  24 666    20.2  24.65
## 428 37.66190   0.0 18.10    0 0.6790 6.202  78.7  1.8629  24 666    20.2  18.82
## 429  7.36711   0.0 18.10    0 0.6790 6.193  78.1  1.9356  24 666    20.2  96.73
## 430  9.33889   0.0 18.10    0 0.6790 6.380  95.6  1.9682  24 666    20.2  60.72
## 431  8.49213   0.0 18.10    0 0.5840 6.348  86.1  2.0527  24 666    20.2  83.45
## 432 10.06230   0.0 18.10    0 0.5840 6.833  94.3  2.0882  24 666    20.2  81.33
## 433  6.44405   0.0 18.10    0 0.5840 6.425  74.8  2.2004  24 666    20.2  97.95
## 434  5.58107   0.0 18.10    0 0.7130 6.436  87.9  2.3158  24 666    20.2 100.19
## 435 13.91340   0.0 18.10    0 0.7130 6.208  95.0  2.2222  24 666    20.2 100.63
## 436 11.16040   0.0 18.10    0 0.7400 6.629  94.6  2.1247  24 666    20.2 109.85
## 437 14.42080   0.0 18.10    0 0.7400 6.461  93.3  2.0026  24 666    20.2  27.49
## 438 15.17720   0.0 18.10    0 0.7400 6.152 100.0  1.9142  24 666    20.2   9.32
## 439 13.67810   0.0 18.10    0 0.7400 5.935  87.9  1.8206  24 666    20.2  68.95
## 440  9.39063   0.0 18.10    0 0.7400 5.627  93.9  1.8172  24 666    20.2 396.90
## 441 22.05110   0.0 18.10    0 0.7400 5.818  92.4  1.8662  24 666    20.2 391.45
## 442  9.72418   0.0 18.10    0 0.7400 6.406  97.2  2.0651  24 666    20.2 385.96
## 443  5.66637   0.0 18.10    0 0.7400 6.219 100.0  2.0048  24 666    20.2 395.69
## 444  9.96654   0.0 18.10    0 0.7400 6.485 100.0  1.9784  24 666    20.2 386.73
## 445 12.80230   0.0 18.10    0 0.7400 5.854  96.6  1.8956  24 666    20.2 240.52
## 446 10.67180   0.0 18.10    0 0.7400 6.459  94.8  1.9879  24 666    20.2  43.06
## 447  6.28807   0.0 18.10    0 0.7400 6.341  96.4  2.0720  24 666    20.2 318.01
## 448  9.92485   0.0 18.10    0 0.7400 6.251  96.6  2.1980  24 666    20.2 388.52
## 449  9.32909   0.0 18.10    0 0.7130 6.185  98.7  2.2616  24 666    20.2 396.90
## 450  7.52601   0.0 18.10    0 0.7130 6.417  98.3  2.1850  24 666    20.2 304.21
## 451  6.71772   0.0 18.10    0 0.7130 6.749  92.6  2.3236  24 666    20.2   0.32
## 452  5.44114   0.0 18.10    0 0.7130 6.655  98.2  2.3552  24 666    20.2 355.29
## 453  5.09017   0.0 18.10    0 0.7130 6.297  91.8  2.3682  24 666    20.2 385.09
## 454  8.24809   0.0 18.10    0 0.7130 7.393  99.3  2.4527  24 666    20.2 375.87
## 455  9.51363   0.0 18.10    0 0.7130 6.728  94.1  2.4961  24 666    20.2   6.68
## 456  4.75237   0.0 18.10    0 0.7130 6.525  86.5  2.4358  24 666    20.2  50.92
## 457  4.66883   0.0 18.10    0 0.7130 5.976  87.9  2.5806  24 666    20.2  10.48
## 458  8.20058   0.0 18.10    0 0.7130 5.936  80.3  2.7792  24 666    20.2   3.50
## 459  7.75223   0.0 18.10    0 0.7130 6.301  83.7  2.7831  24 666    20.2 272.21
## 460  6.80117   0.0 18.10    0 0.7130 6.081  84.4  2.7175  24 666    20.2 396.90
## 461  4.81213   0.0 18.10    0 0.7130 6.701  90.0  2.5975  24 666    20.2 255.23
## 462  3.69311   0.0 18.10    0 0.7130 6.376  88.4  2.5671  24 666    20.2 391.43
## 463  6.65492   0.0 18.10    0 0.7130 6.317  83.0  2.7344  24 666    20.2 396.90
## 464  5.82115   0.0 18.10    0 0.7130 6.513  89.9  2.8016  24 666    20.2 393.82
## 465  7.83932   0.0 18.10    0 0.6550 6.209  65.4  2.9634  24 666    20.2 396.90
## 466  3.16360   0.0 18.10    0 0.6550 5.759  48.2  3.0665  24 666    20.2 334.40
## 467  3.77498   0.0 18.10    0 0.6550 5.952  84.7  2.8715  24 666    20.2  22.01
## 468  4.42228   0.0 18.10    0 0.5840 6.003  94.5  2.5403  24 666    20.2 331.29
## 469 15.57570   0.0 18.10    0 0.5800 5.926  71.0  2.9084  24 666    20.2 368.74
## 470 13.07510   0.0 18.10    0 0.5800 5.713  56.7  2.8237  24 666    20.2 396.90
## 471  4.34879   0.0 18.10    0 0.5800 6.167  84.0  3.0334  24 666    20.2 396.90
## 472  4.03841   0.0 18.10    0 0.5320 6.229  90.7  3.0993  24 666    20.2 395.33
## 473  3.56868   0.0 18.10    0 0.5800 6.437  75.0  2.8965  24 666    20.2 393.37
## 474  4.64689   0.0 18.10    0 0.6140 6.980  67.6  2.5329  24 666    20.2 374.68
## 475  8.05579   0.0 18.10    0 0.5840 5.427  95.4  2.4298  24 666    20.2 352.58
## 476  6.39312   0.0 18.10    0 0.5840 6.162  97.4  2.2060  24 666    20.2 302.76
## 477  4.87141   0.0 18.10    0 0.6140 6.484  93.6  2.3053  24 666    20.2 396.21
## 478 15.02340   0.0 18.10    0 0.6140 5.304  97.3  2.1007  24 666    20.2 349.48
## 479 10.23300   0.0 18.10    0 0.6140 6.185  96.7  2.1705  24 666    20.2 379.70
## 480 14.33370   0.0 18.10    0 0.6140 6.229  88.0  1.9512  24 666    20.2 383.32
## 481  5.82401   0.0 18.10    0 0.5320 6.242  64.7  3.4242  24 666    20.2 396.90
## 482  5.70818   0.0 18.10    0 0.5320 6.750  74.9  3.3317  24 666    20.2 393.07
## 483  5.73116   0.0 18.10    0 0.5320 7.061  77.0  3.4106  24 666    20.2 395.28
## 484  2.81838   0.0 18.10    0 0.5320 5.762  40.3  4.0983  24 666    20.2 392.92
## 485  2.37857   0.0 18.10    0 0.5830 5.871  41.9  3.7240  24 666    20.2 370.73
## 486  3.67367   0.0 18.10    0 0.5830 6.312  51.9  3.9917  24 666    20.2 388.62
## 487  5.69175   0.0 18.10    0 0.5830 6.114  79.8  3.5459  24 666    20.2 392.68
## 488  4.83567   0.0 18.10    0 0.5830 5.905  53.2  3.1523  24 666    20.2 388.22
## 489  0.15086   0.0 27.74    0 0.6090 5.454  92.7  1.8209   4 711    20.1 395.09
## 490  0.18337   0.0 27.74    0 0.6090 5.414  98.3  1.7554   4 711    20.1 344.05
## 491  0.20746   0.0 27.74    0 0.6090 5.093  98.0  1.8226   4 711    20.1 318.43
## 492  0.10574   0.0 27.74    0 0.6090 5.983  98.8  1.8681   4 711    20.1 390.11
## 493  0.11132   0.0 27.74    0 0.6090 5.983  83.5  2.1099   4 711    20.1 396.90
## 494  0.17331   0.0  9.69    0 0.5850 5.707  54.0  2.3817   6 391    19.2 396.90
## 495  0.27957   0.0  9.69    0 0.5850 5.926  42.6  2.3817   6 391    19.2 396.90
## 496  0.17899   0.0  9.69    0 0.5850 5.670  28.8  2.7986   6 391    19.2 393.29
## 497  0.28960   0.0  9.69    0 0.5850 5.390  72.9  2.7986   6 391    19.2 396.90
## 498  0.26838   0.0  9.69    0 0.5850 5.794  70.6  2.8927   6 391    19.2 396.90
## 499  0.23912   0.0  9.69    0 0.5850 6.019  65.3  2.4091   6 391    19.2 396.90
## 500  0.17783   0.0  9.69    0 0.5850 5.569  73.5  2.3999   6 391    19.2 395.77
## 501  0.22438   0.0  9.69    0 0.5850 6.027  79.7  2.4982   6 391    19.2 396.90
## 502  0.06263   0.0 11.93    0 0.5730 6.593  69.1  2.4786   1 273    21.0 391.99
## 503  0.04527   0.0 11.93    0 0.5730 6.120  76.7  2.2875   1 273    21.0 396.90
## 504  0.06076   0.0 11.93    0 0.5730 6.976  91.0  2.1675   1 273    21.0 396.90
## 505  0.10959   0.0 11.93    0 0.5730 6.794  89.3  2.3889   1 273    21.0 393.45
## 506  0.04741   0.0 11.93    0 0.5730 6.030  80.8  2.5050   1 273    21.0 396.90
##     lstat medv
## 1    4.98 24.0
## 2    9.14 21.6
## 3    4.03 34.7
## 4    2.94 33.4
## 5    5.33 36.2
## 6    5.21 28.7
## 7   12.43 22.9
## 8   19.15 27.1
## 9   29.93 16.5
## 10  17.10 18.9
## 11  20.45 15.0
## 12  13.27 18.9
## 13  15.71 21.7
## 14   8.26 20.4
## 15  10.26 18.2
## 16   8.47 19.9
## 17   6.58 23.1
## 18  14.67 17.5
## 19  11.69 20.2
## 20  11.28 18.2
## 21  21.02 13.6
## 22  13.83 19.6
## 23  18.72 15.2
## 24  19.88 14.5
## 25  16.30 15.6
## 26  16.51 13.9
## 27  14.81 16.6
## 28  17.28 14.8
## 29  12.80 18.4
## 30  11.98 21.0
## 31  22.60 12.7
## 32  13.04 14.5
## 33  27.71 13.2
## 34  18.35 13.1
## 35  20.34 13.5
## 36   9.68 18.9
## 37  11.41 20.0
## 38   8.77 21.0
## 39  10.13 24.7
## 40   4.32 30.8
## 41   1.98 34.9
## 42   4.84 26.6
## 43   5.81 25.3
## 44   7.44 24.7
## 45   9.55 21.2
## 46  10.21 19.3
## 47  14.15 20.0
## 48  18.80 16.6
## 49  30.81 14.4
## 50  16.20 19.4
## 51  13.45 19.7
## 52   9.43 20.5
## 53   5.28 25.0
## 54   8.43 23.4
## 55  14.80 18.9
## 56   4.81 35.4
## 57   5.77 24.7
## 58   3.95 31.6
## 59   6.86 23.3
## 60   9.22 19.6
## 61  13.15 18.7
## 62  14.44 16.0
## 63   6.73 22.2
## 64   9.50 25.0
## 65   8.05 33.0
## 66   4.67 23.5
## 67  10.24 19.4
## 68   8.10 22.0
## 69  13.09 17.4
## 70   8.79 20.9
## 71   6.72 24.2
## 72   9.88 21.7
## 73   5.52 22.8
## 74   7.54 23.4
## 75   6.78 24.1
## 76   8.94 21.4
## 77  11.97 20.0
## 78  10.27 20.8
## 79  12.34 21.2
## 80   9.10 20.3
## 81   5.29 28.0
## 82   7.22 23.9
## 83   6.72 24.8
## 84   7.51 22.9
## 85   9.62 23.9
## 86   6.53 26.6
## 87  12.86 22.5
## 88   8.44 22.2
## 89   5.50 23.6
## 90   5.70 28.7
## 91   8.81 22.6
## 92   8.20 22.0
## 93   8.16 22.9
## 94   6.21 25.0
## 95  10.59 20.6
## 96   6.65 28.4
## 97  11.34 21.4
## 98   4.21 38.7
## 99   3.57 43.8
## 100  6.19 33.2
## 101  9.42 27.5
## 102  7.67 26.5
## 103 10.63 18.6
## 104 13.44 19.3
## 105 12.33 20.1
## 106 16.47 19.5
## 107 18.66 19.5
## 108 14.09 20.4
## 109 12.27 19.8
## 110 15.55 19.4
## 111 13.00 21.7
## 112 10.16 22.8
## 113 16.21 18.8
## 114 17.09 18.7
## 115 10.45 18.5
## 116 15.76 18.3
## 117 12.04 21.2
## 118 10.30 19.2
## 119 15.37 20.4
## 120 13.61 19.3
## 121 14.37 22.0
## 122 14.27 20.3
## 123 17.93 20.5
## 124 25.41 17.3
## 125 17.58 18.8
## 126 14.81 21.4
## 127 27.26 15.7
## 128 17.19 16.2
## 129 15.39 18.0
## 130 18.34 14.3
## 131 12.60 19.2
## 132 12.26 19.6
## 133 11.12 23.0
## 134 15.03 18.4
## 135 17.31 15.6
## 136 16.96 18.1
## 137 16.90 17.4
## 138 14.59 17.1
## 139 21.32 13.3
## 140 18.46 17.8
## 141 24.16 14.0
## 142 34.41 14.4
## 143 26.82 13.4
## 144 26.42 15.6
## 145 29.29 11.8
## 146 27.80 13.8
## 147 16.65 15.6
## 148 29.53 14.6
## 149 28.32 17.8
## 150 21.45 15.4
## 151 14.10 21.5
## 152 13.28 19.6
## 153 12.12 15.3
## 154 15.79 19.4
## 155 15.12 17.0
## 156 15.02 15.6
## 157 16.14 13.1
## 158  4.59 41.3
## 159  6.43 24.3
## 160  7.39 23.3
## 161  5.50 27.0
## 162  1.73 50.0
## 163  1.92 50.0
## 164  3.32 50.0
## 165 11.64 22.7
## 166  9.81 25.0
## 167  3.70 50.0
## 168 12.14 23.8
## 169 11.10 23.8
## 170 11.32 22.3
## 171 14.43 17.4
## 172 12.03 19.1
## 173 14.69 23.1
## 174  9.04 23.6
## 175  9.64 22.6
## 176  5.33 29.4
## 177 10.11 23.2
## 178  6.29 24.6
## 179  6.92 29.9
## 180  5.04 37.2
## 181  7.56 39.8
## 182  9.45 36.2
## 183  4.82 37.9
## 184  5.68 32.5
## 185 13.98 26.4
## 186 13.15 29.6
## 187  4.45 50.0
## 188  6.68 32.0
## 189  4.56 29.8
## 190  5.39 34.9
## 191  5.10 37.0
## 192  4.69 30.5
## 193  2.87 36.4
## 194  5.03 31.1
## 195  4.38 29.1
## 196  2.97 50.0
## 197  4.08 33.3
## 198  8.61 30.3
## 199  6.62 34.6
## 200  4.56 34.9
## 201  4.45 32.9
## 202  7.43 24.1
## 203  3.11 42.3
## 204  3.81 48.5
## 205  2.88 50.0
## 206 10.87 22.6
## 207 10.97 24.4
## 208 18.06 22.5
## 209 14.66 24.4
## 210 23.09 20.0
## 211 17.27 21.7
## 212 23.98 19.3
## 213 16.03 22.4
## 214  9.38 28.1
## 215 29.55 23.7
## 216  9.47 25.0
## 217 13.51 23.3
## 218  9.69 28.7
## 219 17.92 21.5
## 220 10.50 23.0
## 221  9.71 26.7
## 222 21.46 21.7
## 223  9.93 27.5
## 224  7.60 30.1
## 225  4.14 44.8
## 226  4.63 50.0
## 227  3.13 37.6
## 228  6.36 31.6
## 229  3.92 46.7
## 230  3.76 31.5
## 231 11.65 24.3
## 232  5.25 31.7
## 233  2.47 41.7
## 234  3.95 48.3
## 235  8.05 29.0
## 236 10.88 24.0
## 237  9.54 25.1
## 238  4.73 31.5
## 239  6.36 23.7
## 240  7.37 23.3
## 241 11.38 22.0
## 242 12.40 20.1
## 243 11.22 22.2
## 244  5.19 23.7
## 245 12.50 17.6
## 246 18.46 18.5
## 247  9.16 24.3
## 248 10.15 20.5
## 249  9.52 24.5
## 250  6.56 26.2
## 251  5.90 24.4
## 252  3.59 24.8
## 253  3.53 29.6
## 254  3.54 42.8
## 255  6.57 21.9
## 256  9.25 20.9
## 257  3.11 44.0
## 258  5.12 50.0
## 259  7.79 36.0
## 260  6.90 30.1
## 261  9.59 33.8
## 262  7.26 43.1
## 263  5.91 48.8
## 264 11.25 31.0
## 265  8.10 36.5
## 266 10.45 22.8
## 267 14.79 30.7
## 268  7.44 50.0
## 269  3.16 43.5
## 270 13.65 20.7
## 271 13.00 21.1
## 272  6.59 25.2
## 273  7.73 24.4
## 274  6.58 35.2
## 275  3.53 32.4
## 276  2.98 32.0
## 277  6.05 33.2
## 278  4.16 33.1
## 279  7.19 29.1
## 280  4.85 35.1
## 281  3.76 45.4
## 282  4.59 35.4
## 283  3.01 46.0
## 284  3.16 50.0
## 285  7.85 32.2
## 286  8.23 22.0
## 287 12.93 20.1
## 288  7.14 23.2
## 289  7.60 22.3
## 290  9.51 24.8
## 291  3.33 28.5
## 292  3.56 37.3
## 293  4.70 27.9
## 294  8.58 23.9
## 295 10.40 21.7
## 296  6.27 28.6
## 297  7.39 27.1
## 298 15.84 20.3
## 299  4.97 22.5
## 300  4.74 29.0
## 301  6.07 24.8
## 302  9.50 22.0
## 303  8.67 26.4
## 304  4.86 33.1
## 305  6.93 36.1
## 306  8.93 28.4
## 307  6.47 33.4
## 308  7.53 28.2
## 309  4.54 22.8
## 310  9.97 20.3
## 311 12.64 16.1
## 312  5.98 22.1
## 313 11.72 19.4
## 314  7.90 21.6
## 315  9.28 23.8
## 316 11.50 16.2
## 317 18.33 17.8
## 318 15.94 19.8
## 319 10.36 23.1
## 320 12.73 21.0
## 321  7.20 23.8
## 322  6.87 23.1
## 323  7.70 20.4
## 324 11.74 18.5
## 325  6.12 25.0
## 326  5.08 24.6
## 327  6.15 23.0
## 328 12.79 22.2
## 329  9.97 19.3
## 330  7.34 22.6
## 331  9.09 19.8
## 332 12.43 17.1
## 333  7.83 19.4
## 334  5.68 22.2
## 335  6.75 20.7
## 336  8.01 21.1
## 337  9.80 19.5
## 338 10.56 18.5
## 339  8.51 20.6
## 340  9.74 19.0
## 341  9.29 18.7
## 342  5.49 32.7
## 343  8.65 16.5
## 344  7.18 23.9
## 345  4.61 31.2
## 346 10.53 17.5
## 347 12.67 17.2
## 348  6.36 23.1
## 349  5.99 24.5
## 350  5.89 26.6
## 351  5.98 22.9
## 352  5.49 24.1
## 353  7.79 18.6
## 354  4.50 30.1
## 355  8.05 18.2
## 356  5.57 20.6
## 357 17.60 17.8
## 358 13.27 21.7
## 359 11.48 22.7
## 360 12.67 22.6
## 361  7.79 25.0
## 362 14.19 19.9
## 363 10.19 20.8
## 364 14.64 16.8
## 365  5.29 21.9
## 366  7.12 27.5
## 367 14.00 21.9
## 368 13.33 23.1
## 369  3.26 50.0
## 370  3.73 50.0
## 371  2.96 50.0
## 372  9.53 50.0
## 373  8.88 50.0
## 374 34.77 13.8
## 375 37.97 13.8
## 376 13.44 15.0
## 377 23.24 13.9
## 378 21.24 13.3
## 379 23.69 13.1
## 380 21.78 10.2
## 381 17.21 10.4
## 382 21.08 10.9
## 383 23.60 11.3
## 384 24.56 12.3
## 385 30.63  8.8
## 386 30.81  7.2
## 387 28.28 10.5
## 388 31.99  7.4
## 389 30.62 10.2
## 390 20.85 11.5
## 391 17.11 15.1
## 392 18.76 23.2
## 393 25.68  9.7
## 394 15.17 13.8
## 395 16.35 12.7
## 396 17.12 13.1
## 397 19.37 12.5
## 398 19.92  8.5
## 399 30.59  5.0
## 400 29.97  6.3
## 401 26.77  5.6
## 402 20.32  7.2
## 403 20.31 12.1
## 404 19.77  8.3
## 405 27.38  8.5
## 406 22.98  5.0
## 407 23.34 11.9
## 408 12.13 27.9
## 409 26.40 17.2
## 410 19.78 27.5
## 411 10.11 15.0
## 412 21.22 17.2
## 413 34.37 17.9
## 414 20.08 16.3
## 415 36.98  7.0
## 416 29.05  7.2
## 417 25.79  7.5
## 418 26.64 10.4
## 419 20.62  8.8
## 420 22.74  8.4
## 421 15.02 16.7
## 422 15.70 14.2
## 423 14.10 20.8
## 424 23.29 13.4
## 425 17.16 11.7
## 426 24.39  8.3
## 427 15.69 10.2
## 428 14.52 10.9
## 429 21.52 11.0
## 430 24.08  9.5
## 431 17.64 14.5
## 432 19.69 14.1
## 433 12.03 16.1
## 434 16.22 14.3
## 435 15.17 11.7
## 436 23.27 13.4
## 437 18.05  9.6
## 438 26.45  8.7
## 439 34.02  8.4
## 440 22.88 12.8
## 441 22.11 10.5
## 442 19.52 17.1
## 443 16.59 18.4
## 444 18.85 15.4
## 445 23.79 10.8
## 446 23.98 11.8
## 447 17.79 14.9
## 448 16.44 12.6
## 449 18.13 14.1
## 450 19.31 13.0
## 451 17.44 13.4
## 452 17.73 15.2
## 453 17.27 16.1
## 454 16.74 17.8
## 455 18.71 14.9
## 456 18.13 14.1
## 457 19.01 12.7
## 458 16.94 13.5
## 459 16.23 14.9
## 460 14.70 20.0
## 461 16.42 16.4
## 462 14.65 17.7
## 463 13.99 19.5
## 464 10.29 20.2
## 465 13.22 21.4
## 466 14.13 19.9
## 467 17.15 19.0
## 468 21.32 19.1
## 469 18.13 19.1
## 470 14.76 20.1
## 471 16.29 19.9
## 472 12.87 19.6
## 473 14.36 23.2
## 474 11.66 29.8
## 475 18.14 13.8
## 476 24.10 13.3
## 477 18.68 16.7
## 478 24.91 12.0
## 479 18.03 14.6
## 480 13.11 21.4
## 481 10.74 23.0
## 482  7.74 23.7
## 483  7.01 25.0
## 484 10.42 21.8
## 485 13.34 20.6
## 486 10.58 21.2
## 487 14.98 19.1
## 488 11.45 20.6
## 489 18.06 15.2
## 490 23.97  7.0
## 491 29.68  8.1
## 492 18.07 13.6
## 493 13.35 20.1
## 494 12.01 21.8
## 495 13.59 24.5
## 496 17.60 23.1
## 497 21.14 19.7
## 498 14.10 18.3
## 499 12.92 21.2
## 500 15.10 17.5
## 501 14.33 16.8
## 502  9.67 22.4
## 503  9.08 20.6
## 504  5.64 23.9
## 505  6.48 22.0
## 506  7.88 11.9
colSums(is.na(Boston))
##    crim      zn   indus    chas     nox      rm     age     dis     rad     tax 
##       0       0       0       0       0       0       0       0       0       0 
## ptratio   black   lstat    medv 
##       0       0       0       0

Burada hiç missing değer olmadığını görüyoruz.

Verimizi yine ikiye bölelim

set.seed(123)
library(caTools)
split <- sample.split(Boston$medv, SplitRatio = 0.7)
train_data <- subset(Boston, split == TRUE)
test_data <- subset(Boston, split == FALSE)

Analizi yapalım

model_lm <- lm(medv ~ ., data = train_data)
summary(model_lm)
## 
## Call:
## lm(formula = medv ~ ., data = train_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.5797  -2.6006  -0.5752   2.1161  18.1208 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  33.765376   5.622702   6.005 4.75e-09 ***
## crim         -0.117755   0.033906  -3.473 0.000579 ***
## zn            0.044972   0.014919   3.014 0.002761 ** 
## indus         0.040452   0.066552   0.608 0.543693    
## chas          2.457784   0.963830   2.550 0.011194 *  
## nox         -19.711311   4.168890  -4.728 3.28e-06 ***
## rm            4.618992   0.440164  10.494  < 2e-16 ***
## age          -0.008863   0.013928  -0.636 0.524931    
## dis          -1.559286   0.219516  -7.103 6.76e-12 ***
## rad           0.226005   0.072663   3.110 0.002021 ** 
## tax          -0.010045   0.004202  -2.390 0.017351 *  
## ptratio      -0.933469   0.142952  -6.530 2.29e-10 ***
## black         0.005653   0.002966   1.906 0.057479 .  
## lstat        -0.500263   0.055431  -9.025  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.418 on 353 degrees of freedom
## Multiple R-squared:  0.7905, Adjusted R-squared:  0.7828 
## F-statistic: 102.5 on 13 and 353 DF,  p-value: < 2.2e-16

Tahminlerimizi çıkaralım

predictions <- predict(model_lm, test_data)
mse <- mean((test_data$medv - predictions)^2)
rmse <- sqrt(mse)
rss <- sum((test_data$medv - predictions)^2)
tss <- sum((test_data$medv - mean(test_data$medv))^2)
r_squared <- 1 - (rss/tss)
cat("MSE: ", mse, "\n")
## MSE:  31.88877
cat("RMSE: ", rmse, "\n")
## RMSE:  5.647014
cat("R-squared: ", r_squared, "\n")
## R-squared:  0.5366081

R-squared’in ne olduğunu öğrenmiştik. Predictorlarımız dependent variable içindeki değişimin % kaçını ölçüyor.

MSE actual değer ve fitted değerler arasındaki farka bakar. Bunun düşük olması daha iyidir.

RMSE ise hatanın oranını dependent değişken ile aynı ünitlerde olduğu bir birimdir. MSE’ye göre interpret etmesi de bundan dolayı daha kolaydır. Mesela bu önreğiizde RMSE 5-6 arası. Bu da bizim tahminlerimizin gerçek değerlere göre 5-6 dolar arası saptığını gösteriyor.

İyi bir model düşük RMSE, MSE ve yüksek R-squared değerleine sahip olmalıdır. Bu değerler özellikle iki modeli karşılaştırırken hangisinin daha iyi olduğunu anlamamıza yardımcı olur.

SON